Software Sustainability Institute
www.software.ac.uk
Software – a different kind of research object?
http://dx.doi.org/10.6084/m9.figshare.5459542
3rd October 2017, Lancaster Data Conversations, LancasterNeil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | [email protected]
Slides licensed underCC-BY where indicated:
Supported by Project funding from
Software Sustainability Institute
www.software.ac.ukWhat’s software got to do with my research?
The research community
relies on software
Do you use research
software?
What would happen to your
research without software
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014.
406 respondents covering representative range of funders, discipline and seniority.
56%Develop their
own software
71%Have no formal
software training
Software Sustainability Institute
www.software.ac.uk
Software in Nature
Nangia and Katz: https://arxiv.org/pdf/1706.06527.pdf
Software Sustainability Institute
www.software.ac.uk
Raise standards for preclinical cancer research
47 out of 53 “landmark” publications
could not be replicated Be
gley
, Elli
s. N
atu
re, 4
83
, 20
12
do
i:10
.10
38
/48
35
31
a
Software Sustainability Institute
www.software.ac.uk
Repeatability of published microarray gene expression
analyses56% of analyses could not be repeated,
of which 30% were because of software issues. 50% did not state software version, 39% did not provide raw data.
Only 11% could be reproduced satisfactorily.
Ioannidis et al. Nature Genetics, 41, 2010doi:10.1038/ng.295
Software Sustainability Institute
www.software.ac.uk
Repeatability in Computer Science
Of 401 papers in ACM Computer Science journals and proceedings, only 85 provided a link to software.For 176 the software could not be obtained.
Collberg, Proebsting, Warren, University of Arizona TR 14-04, 2015 http://reproducibility.cs.arizona.edu/v2/RepeatabilityTR.pdf
Software Sustainability Institute
www.software.ac.uk
Errors due to bioinformatics pipeline
The results presented in the Report “Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent“ were affected by a bioinformatics error – identified because of open science
Llorente et al. Science, 350, 6262doi:10.1126/science.aad2879
Software Sustainability Institute
www.software.ac.uk
T
Software Sustainability Institute
www.software.ac.ukIsn’t software just a typeof data?
Software Sustainability Institute
www.software.ac.uk
Authorship Lifecycle
IdentifyCite
Reuse
Research
Index
Papers, data, software all research outputs ofa continuous cycle.
With software, technologymakes it easier to track, but not reward.
We cannot separatepapers, data and softwarewhen we release research.
http://openresearchsoftware.metajnl.com
Software Sustainability Institute
www.software.ac.uk
The current process
Startresearch
Writesoftware
Usesoftware
Produceresults
Publishresearch
paper
Releasedata
Releasesoftware
Which mentions software and data
This process is simple but does not reward production orreuse of good software and data.
It also has a long contribution cycle.
Software Sustainability Institute
www.software.ac.uk
Writesoftware
A better process?
Startresearch
Identifyexisting
software
Usesoftware
Produceresults
Publishresearch
paper
Adapt/extend
software
Releasedata
Releasesoftware
Publishsoftware
paper Publishdata
paper
Wh
ich referen
ces so
ftware an
d d
ata pap
ers
Software and data papers are needed as proxies for rewarding reuse.
But it enables a shorter contribution cycle for data and software.
Software Sustainability Institute
www.software.ac.uk
What do we choose to identify:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies? What’s the minimum citable part?
Boundary
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.uk
Algorithm
Function
Pro
gram
Library / Su
ite / Package
…
Granularity
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.ukVersioning
Personalv1
Personalv2
Personalv3
Personal v2a
Public v1
Personal v3a
Personal v2a
Public v2
Public v3
Why do we version?- To indicate a change- To allow sharing- To confer special status
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.uk
AuthorshipAuthorship• Which authors have had what impact on each version of the software?• Who had the largest contribution to the scientific results in a paper?
http://beyond-impact.org/?p=175
OGSA-DAI projects statistics from Ohloh
http://dx.doi.org/10.6084/m9.figshare.1497930
Software Sustainability Institute
www.software.ac.ukIf software is so important, why is most of it hard to reuse?
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating better, more sustainable, research software to enable world-class research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software Supported by EPSRC Grant EP/H043160/1
+ EPSRC/ESRC/BBSRC grant EP/N006410/1
Software Sustainability Institute
www.software.ac.uk
, it’
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
Software Sustainability Institute
www.software.ac.uk
T
Research Culture Needs Changing
“This particular project was something I wrote a couple years ago to help me out with a workflow… I’d put it up on Github, so that others could potentially use it or use the code. So I went to see what people were saying about this project. It seemed liked I’d done something fundamentally wrong, so stupid that it flabbergasts someone... So of course I start sobbing. Then I see these people’s follower count, and I sob harder. I can’t help but think of potential future employers that are no longer potential.”
http://www.software.ac.uk/blog/2013-01-25-haters-gonna-hate-why-you-shouldnt-be-ashamed-releasing-your-code
Software Sustainability Institute
www.software.ac.uk
T
Research Culture Needs Changing
Our research culture presents barriers but few incentives to sharing code
• There is a fear of being “found out” for poor code, but no encouragement or resources to improve software engineering skills
• There is no reward for publishing code in the current system of metrics. Researchers fear being “scooped” or losing ability to publish.
• Many organisations do not understand how to exploit open source licenses
Software Sustainability Institute
www.software.ac.ukNever be ashamed of making your software available
Software Sustainability Institute
www.software.ac.uk
T
Vandewalle (2012) DOI: 10.1109/MCSE.2012.63
Software Sustainability Institute
www.software.ac.uk
Research Software Workflow
develop share preserve
Developed and versioned using code repository
Published via code repositoryor website
Deposited in digital repositorywith paper / for preservation
Software Sustainability Institute
www.software.ac.uk
Good Enough Practices To Please Your Future Self
• Data:
Save and backup raw data
Create analysis-friendly data
Record your processing steps
Anticipate the need to use multiple tables, and use a unique identifier for each record
Submit data to a repository and get a DOI
Software Sustainability Institute
www.software.ac.uk
Good Enough Practices To Please Your Future Self
• Software: Document for your future self:
• Brief descriptive comment at the start of your code • Provide a simple example or test data set• Give functions and variables meaningful names• Make dependencies and requirements explicit
Learn to be modular• Break programs into functions• Don’t duplicate functionality• Search for well-maintained libraries that do what you need
Make it accessible in the future• Make the license explicit• Keep track of changes• Submit code to a reputable DOI-issuing repository
Good Enough Practices in Scientific Computing: https://doi.org/10.1371/journal.pcbi.1005510
Software Sustainability Institute
www.software.ac.uk
What you can do now
• Make sure you’re using version control
• Write a README file that describes how you can get your code up and running, and give it to a colleague to try out
What it does, requirements / dependencies, simple example of use and input + output data
• Ask a collaborator to contribute a new piece of functionality, and get feedback on the process
• Talk to your library / IT services about the services they offer
Software Sustainability Institute
www.software.ac.uk
Get some training
Teach basic lab skillsfor scientific computing
so that researchers can do more in less time and with less pain.
Teach basic concepts, skills and tools for working more effectively with data. Workshops are designed for people with little to no prior computational experience.
Open source learning, that can be tailored to disciplines.“Train the trainers”: building a capable base of instructors.
Software Sustainability Institute
www.software.ac.uk
Three months from now, you will thank yourself!
Software Sustainability Institute
www.software.ac.uk
Interested in more?
• Publish a software paper
http://bit.ly/softwarejournals
• Easily archive your GitHub Code and make ircitable
GitHub to Zenodo
GitHub to FigShare
• Software Citation Implementation WG
https://www.force11.org/group/software-citation-implementation-working-group
Software Sustainability Institute
www.software.ac.uk
Literate Programming
• Traditional papers are just advertisements
A literate computing document is the research
• The technology is out there
Jupyter notebooks
Mathematica
R Markdown
knitR
MATLAB Live scripts
Software Sustainability Institute
www.software.ac.uk
• LIGO Paper:
http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.061102
• LIGO Notebook:
https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.ipynb
LIGO Example
Software Sustainability Institute
www.software.ac.uk
SSI Fellows 2018 / CW18
• SSI Fellowships
Deadline: 9th October 2017
£3000 bursary to be a research software advocate
Join a network of great people working to improve
• Collaborations Workshop 2018
Cardiff, 26-28th March 2018
Theme: “Culture Change and Productivity”
The un-conference that most participants would recommend to their colleagues
Software Sustainability Institute
www.software.ac.uk
T
Without data it’s difficult to validate results. But without code, we waste the opportunity to advance science.
These slides: http://dx.doi.org/10.6084/m9.figshare.5459542
“The only way to publish software in a scientifically robust manner is to share source code, and that means publishing via the internet in an open-access/open-source fashion. —Warren Lyford DeLano, Creator of PyMOL, 2005
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating better, more sustainable, research software to enable world-class research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software Supported by EPSRC Grant EP/H043160/1
+ EPSRC/ESRC/BBSRC grant EP/N006410/1
Software
Policy
Training
Community
Outreach
Delivering essential software
skills to researchers via CDTs,
institutions & doctoral schools
Helping the community to
develop software that meets the
needs of reliable, reproducible,
and reusable research
Collecting evidence
on the community’s
software use & sharing
with stakeholders
Bringing together
the right people to
understand and address
topical issues
Exploiting our platform to
enable engagement,
delivery & uptake
Website & blog
Campaigns
Advice
Guides
Courses
Workshops
Fellowship
Research
Software
Policy
Training
Community
Consultancy50+ projects
130+ evaluations
4 surgeries
35+ UK SWC
workshops
1000+ learners
80+ guides
50,000 readers
61 domain
ambassadors
20+ workshops organised
740 researchers
50,000 grants
analysed
150+ contributed articles
20,000 unique visitors per month
3,000 Twitter followers
300+ RSEs engaged 2100 signatures 13 issues highlighted
Outreach
Software Sustainability Institute
www.software.ac.uk
Find out more about the SSI
• Community Engagement (Lead: Shoaib Sufi) Fellowship Programme Events and Workshops
• Consultancy (Lead: Steve Crouch) Open Call for Projects / Collaborations Software Evaluation
• Policy and Publicity (Lead: Simon Hettrick) Case Studies / Policy Campaigns Software and Research Blog
• Training (Lead: Aleksandra Nenadic) Software Carpentry and Data Carpentry (300+ students/year) Guides and Top Tips
• Journal of Open Research Software (Editor: Neil Chue Hong)
• Collaboration between universities of Edinburgh, Manchester, Oxford and SouthamptonSupported by EPSRC Grant EP/H043160/1 + EPSRC/ESRC/BBSRC grant EP/N006410/1
Software Sustainability Institute
www.software.ac.uk
Software Sustainability Institute
www.software.ac.uk
T
Research Culture Needs Changing
But there’s still a lot to be done• Software Assessment
• Software Management Plans
• Group Identifiers Software project teams – encompassing
contributors
Software products – across versions
• Machine readable references Software papers solve the credit problem
The reference problem is still hard• Where is software mentioned and can we find it
Software Sustainability Institute
www.software.ac.uk
T
Research Culture Needs Changing
Mechanisms are becoming available• Roles
Project Credit http://credit.casrai.org/
Transitive Credit http://doi.org/10.5334/jors.be
• Mechanisms Software papers http://bit.ly/softwarejournals
Software citation https://doi.org/10.7717/peerj-cs.86
• Tools Researcher Identifiers e.g. ORCID http://orcid.org/
Alt-Metrics e.g. ImpactStory http://impactstory.org/
• Metadata CodeMeta http://codemeta.github.io/
Software Sustainability Institute
www.software.ac.uk
T
Research Culture Needs Changing
Software Referencing needs • Where is software referenced in publications?
• How can we understand its influence?
• How can we choose between software?
Howison, Bullard 2015. DOI: 10.1002/asi.23538