i conference2015 goble-finalupload
TRANSCRIPT
Results Vary The Pragmatics of Reproducibility and Research Object FrameworksProfessor Carole Goble CBE FREng FBCS
The University of Manchester UK
The Software Sustainability Institute
carolegoblemanchesteracuk
iConference 26 March 2015 Newport Beach Los Angeles USA
What do I do CyberInfrastructure EcoSystems
e-Lab Collabs ampShared Asset Repositories
Knowledge Metadata Linked Data Ontologies
Software Engineering for Scientists
ComputationalWorkflow Systems
Scholarly Comms
Reproducibility
MicroPublications
Open Science
Research Objects
Linked Data forScience
Scientific EgoSystems
Biodiversity
Systems Biology
Synthetic Biology
Astronomy
HelioPhysics
Genomics
Health Epidemiology
Digital Preservation
Social Science
Pharmacology
Knowledge Turning Flow
Barriers to Cure
raquo Access to scientific resources
raquo Coordination and Collaboration
raquo Flow of Information
httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
What do I do CyberInfrastructure EcoSystems
e-Lab Collabs ampShared Asset Repositories
Knowledge Metadata Linked Data Ontologies
Software Engineering for Scientists
ComputationalWorkflow Systems
Scholarly Comms
Reproducibility
MicroPublications
Open Science
Research Objects
Linked Data forScience
Scientific EgoSystems
Biodiversity
Systems Biology
Synthetic Biology
Astronomy
HelioPhysics
Genomics
Health Epidemiology
Digital Preservation
Social Science
Pharmacology
Knowledge Turning Flow
Barriers to Cure
raquo Access to scientific resources
raquo Coordination and Collaboration
raquo Flow of Information
httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Scientific EgoSystems
Biodiversity
Systems Biology
Synthetic Biology
Astronomy
HelioPhysics
Genomics
Health Epidemiology
Digital Preservation
Social Science
Pharmacology
Knowledge Turning Flow
Barriers to Cure
raquo Access to scientific resources
raquo Coordination and Collaboration
raquo Flow of Information
httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Knowledge Turning Flow
Barriers to Cure
raquo Access to scientific resources
raquo Coordination and Collaboration
raquo Flow of Information
httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
[Pettifer Attwood]
httpgetutopiacom
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Virtual WitnessingScientific publications
raquo announce a result
raquo convince readers the result is correct
ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo
Jill Mesirov Broad Institute 2010
Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653
Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015
ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo
httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo
David Donoho ldquoWavelab and Reproducible Researchrdquo 1995
Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware
Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012
50 papers randomly chosen from 378
manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads
31 no sw version parameters exact
version of genomic reference sequence
26 no access to primary data sets
Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Broken software Broken science
raquo Geoffrey Chang Scripps Institute
raquo Homemade data-analysis program inherited from another lab
raquo Flipped two columns of data inverting the electron-density map used to derive protein structure
raquo Retract 3 Science papers and 2 papers in other journals
raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)
Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Software making practices
ldquoAs a general rule
researchers do not
test or document their
programs rigorously
and they rarely
release their codes
making it almost
impossible to
reproduce and verify
published results
generated by
scientific softwarerdquo
2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
republic of science
regulation of science
institution cores libraries
Mertonrsquos four norms of scientific behaviour (1942)
public services
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Tools StandardsMachine actionableFormats Reporting Policies Practices
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Record and AutomateEverything
Potential Trace Heaven Folks
recomputationorg
sciencecodemanifestoorg
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Honest Error Science is messy
Inherent
ReinhartRogoff Austerity economicsThomas Herndon
Nature Oct rsquo12
Zoeuml Corbyn
Fraud
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo
Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
When research goes ldquowrongrdquo
raquo Tainted resources
raquo Black boxes
raquo Poor Reporting
raquo Unavailable resources results data software
raquo Bad maths
raquo Sins of omission
raquo Poor training sloppiness
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013
Scientific method
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Social environmentraquo Impact factor mania
raquo Pressure to publish
raquo Broken peer review
raquo Research never reported
raquo Disorganisation
raquo Time pressures
raquo Prep amp curate costs
When research goes ldquowrongrdquo
httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)
Morrison
Do a Replication Study No thanks Not FAIR
Hard Resource intensiveUnrecognised TrolledJust gathering the bits together
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Cross-Institutional e-Laboratory Fragmentation
Scattered parts Subject specific General resources
101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Process at ScaleMore on Models
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
httpsdoiorg1015490seek1investigation56
[Snoep 2015]
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
httpsdoiorg1015490seek1investigation56
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Aggregated Commons Infrastructure
Consistent Comparative Reporting
Design protocols samples software modelshellip
httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Pop-Up Start UpsLittle Science within Big Science
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust
Extrinsic Driver
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
How do you get Scientists and Developers to work together Socially Its all about The Trust
Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Research Objects
Compound Interconnected Investigations Research Products
Multi-variousProductsPlatformsResources
Units of exchange commons contextual metadata
httpwwwresearchobjectorg
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
httpwwwresearchobjectorg
First class citizens - data software methods - id manage credit track profile focus
A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context
Research Objects
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Bigger on the inside than the outside
Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward
Contributionsbull multi ndashtyped stewarded
sited authoredbull span research researchers
platforms timebull cite resolve steward
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Identity + Minimal Provenance
RO Resolution and Citation
rsaquo Defend it (snapshot)
rsaquo Locate it (most recent)
rsaquo Reuse it (a version a component)
rsaquo Credit it (contributory authorship)
rsaquo Cross link it (connections)
Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013
means
ends
driver
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons
STELAR Asthma e-Lab Study Team for Early Life Asthma Research
Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system
STELAR e-Lab
Platform 1
Platform 2
Platform 3
A multi-site collaboration to support safe use of patient and research data for medical research
Research Object CurrencyCohort Studies
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Focus on methods models workflows scripts software data figureshellip
Research Object Pivots and Profiles
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation
R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482
Simply data + code
Can change the definition of
a figure and ultimately the
journal article
Colomb J and Brembs B
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1
ref status indexed httpf1000res3is]
F1000Research 2014 3176
Other labs can replicate the study or
contribute their data to a meta-
analysis or disease model - figure
automatically updates
Data updates time-stamped
New conclusions added via versions
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012
Software-like Release paradigm Not a static document paradigm
Reproduce looks backwards -gt Release looks forwards
raquo Science methods data change -gt agile evolution
raquo Comparisons versions forks amp merges dependencies
raquo Id amp Citations
raquo Interlinked ROs
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
[McEntyre]
Retrospective Release Research Object
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
The ROs Meme
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
recompute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo
ldquoshow A is true by doing Brdquo
verify but not falsify[Yong Nature 485 2012]
robustness tolerance
verification compliance
validation assurance
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
1 Science Changes So does the Lab
ldquoThe questions donrsquot
change but the
answers dordquoDan Reed
The lab is not fixedUpdated resources
UncertaintyBioSTIF
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012
2 Instruments Break Labs Decaymaterials become unavailable technicians leave
Reproducibility Window
raquo Bit rot Black boxes
raquo Proprietary Licenses
raquo Clown services
raquo Partial replication
raquo Prepare to Repair
rsaquo form or function
rsaquo preserve or sustain
Jason Scott
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
RO as Instrument Materials Method
Input Data
Software
Output Data
ConfigParameters
Methods(techniques algorithms
spec of the steps)
Materials(datasets parameters
algorithm seeds)
Experiment
Instruments(codes services scripts
underlying libraries)
Laboratory(sw and hw infrastructure
systems software
integrative platforms)
Setup
Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Research Environment
submit articleand move onhellip
publish articlePublication Environment
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Research Environment
publish articlePublication Environment
submit articleand move onhellip
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
[Adapted Freire 2013]
transparencydependencies
steps featuresprovenance trace
portability
robustness
preservation
accessavailable
descriptionintelligible
standardscommon APIs
licensing
standardscommon
metadata
change managementversioning
packaging
Machine actionable
Machine actionable
Reproducibility Framework
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
submit articleand move onhellip
Reporting
Documentation
Provenance ndashThick Trace Data
to Distilled Reporting
Distillation and
Summarisation
Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Reproduce by ReadingArchived Record Retain the ProcessCode
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
The IT Crowd Series 3 Episode 4
The eLab Virtual Machine (or Docker Image)
a black box thoughdockercom
Reproduce by Running Active InstrumentRetain the bits
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Portability
Transparency
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
ReproZip
Workflowsmakefiles
serviceScience as a Service
Integrative frameworks
Open Source
WorkflowsScripts
Virtual Machines
Portable Packaging
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
Shared Repository
Personal Notebook
Community Registry
Publishing Resource
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Fifty Shades of Research Object
Workflow Instrument
Example data and configComponentsPlug-ins Versions
Workflow System Instrument
Software package
Workflow RunsData and configsProvenance logs
Study
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
standardsAdobe
UCFORE PROVODF
formats
api
Instrument
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf
Instrument
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
NISO-JATS
Instrument
J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Platform profiles
NISO-JATS
Instrument
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Container
Manifest
OMEX archive
httpsresearchobjectgithubiospecificationsbundle
Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369
Retro-Fitted ROsusing off the shelf
platforms
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Method Matters
Reproducibility Smarts
Commons not Repository
Research Tardis
Retro-fit ROs
Do As Little As Possible
Make -gt Born
Native RO platforms
RARE amp FAIR Knowledge Turns Means Research Objects
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
httpdoctorwhosite1weeblycomsonic-screwdrivershtml
Researchers
Silver bullet tools
Psychic paper
httpbowjamesbowca20080608shhhhhhh-silencshtml
PI Team
RARE Research Reality Check
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
RARE Research Reality Check
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Tribal Behaviour
raquo Gangs share but not with the public
raquo Tribal behaviours rsaquo Modellers share more than Experimentalists
rsaquo Experimentalists reuse models more than Modellers
raquo Trading behavioursrsaquo Collaboration ndash complementarity
correlations
raquo Structured consortia less likely to publicly share than individuals
raquo Post-hoc rationalised DataModel Cycles
[Garza 2014]
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
raquo Fluid transient collaborations gt ldquomy gangrdquo management
raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity
raquo Class captains (prefects)
raquo Get the cool kids on board
raquo Head teacher leadership
[Garza 2014]
Playground Rules
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Trace Data
27032015 74
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
me
ME
my team
closecolleagues
peers
The Research Release Creep Spiral
raquo Data Hugging amp Flirting
raquo Reciprocity norms
raquo Hans W request
raquo Dowry phenomenon
raquo Private installations
raquo Private spaces on shared installations
raquo Safe havens
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Too ugly to show anyone else
Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it
The code is too sophisticated for most readersrefereesI didnt work out all the details
I didnt actually write the code -- my student did
My competitors would be unfair to me
Its valuable intellectual property
It would make papers much longer
Referees would never agree to check the code
My code invokes other code with unpublished (proprietary) code
Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Drivers
love money
fame duty
fear timeeffort
shame duty
[Apologies to Resnick and Malone]
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Stealthy not Sneakyreduce the friction
instrumentationspan RARE and FAIR
Optimising The Neylon Equation
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Interface Framingraquo Limited scheduled sharing choices
rsaquo Never say never
raquo ldquoCitablerdquo not ldquoSharedrdquo
raquo Feedback
rsaquo Guilt tripping
rsaquo Outlier finger pointing
[Garzia]
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08
ELNs and Authoring Platforms
Sweave
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Credit ne AuthorshipResearch Currencies
ldquoResearchBitCoinrdquo
Citation Semantics
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Training
56Of UK researchers develop their own research software or scripts
73 Of UK researchers have had no formal software engineering training
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
httpwwwrseacuk
Instrument Artisans
[Shapin 84]
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Make Software Visible[1960s Boeing 747-100 Software Configuration]
Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review
87 software findable78 credit37 formal citation 5 actual version
90 Bio articles24 journals had citation policy
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
BUThelliphellip
two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
Inspired by Bob Harrison
bull Incremental shift for infrastructure providers
bull Moderate shift for policy makers and stewards
bull Paradigm shift for researchers and their institutions
The RO amp Reproducibility Challenge
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble
All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group
httpwwwresearchobjectorg
httpwwwwf4ever-projectorg
httpwwwfair-domorg
httpseek4scienceorg
httprightfieldorguk
httpwwwsoftwareacuk
httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza
Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence
Contacthellip
Professor Carole Goble
The University of Manchester UK
carolegoblemanchesteracuk
httpssitesgooglecomsitecarolegoble
CaroleAnneGoble