doing science properly in the digital age - rutgers seminar
DESCRIPTION
Seminar given at Rutgers University on 2nd October 2012.TRANSCRIPT
Software Sustainability Institute
www.software.ac.uk
Doing Science Properly in the Digital Age2 October 2012, Rutgers UniversityNeil Chue Hong (@npch) [email protected]
Software Sustainability Institute
www.software.ac.uk
Four Paradigms of Research
Empirical
Theoretical
Computational
Data Exploration
Software Sustainability Institute
www.software.ac.uk
Software is pervasive in research
Software Sustainability Institute
www.software.ac.uk
Just the Nature of the problem?
Maintenance is not funHacking new stuff is fun
Statistics courtesy of Jo Hannay et al, “How Do Scientists Develop and Use Scientific Software?
Published online 13 October 2010 | Nature 467, 775-777 (2010) doi:10.1038/467775a
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software Supported by EPSRC
Grant EP/H043160/1
Software Sustainability Institute
www.software.ac.uk
People
UK Research Computing Ecosystem
Computing Communities
…
Network/Collaboration
Instruments
Software Data Centres
Software Sustainability Institute
www.software.ac.uk
SSI Organisation
• Community Engagement (Shoaib Sufi) Fellowship Programme
• Consultancy (Steve Crouch) Open Call for Projects Software Evaluation
• Policy (Simon Hettrick) Guides and Case Studies
• Training (Mike Jackson) Software Carpentry Software Surgeries
• Collaboration between universities of Edinburgh, Manchester, Oxford and Southampton
Software Sustainability Institute
www.software.ac.uk
Case Study: Ligand Binding
• Centre for Computational Chemistry, Bristol New methods for rapid MC sampling of
biomolecular systems modelled using QM/MM Developed two codes ProtoMS (F77) + Sire (C++) Water-Swap Reaction Coordinate method to
calculate absolute protein-ligand binding free energies
• SSI’s work is helping to scale development ProtoMS and Sire both single developer codes ASPIRE/ACQUIRE framework has multiple devs
• Split architecture between ASPIRE (adaptive multiresolution hybrid MD simulation) and ACQUIRE (WorkPacket scheduling system with optimisation for time to result vs “green-ness”
• http://www.siremol.org/adaptive_dynamics
Software Sustainability Institute
www.software.ac.uk
Case Study: Brain Imaging
• Brain Research Imaging Centre, Edinburgh Develop PrivacyGuard software, a DICOM
image deidentification toolkit Created software to support new multispectral
colouring modulation and variance identification technique (“MCMxxxVI”) toidentify white matter lesions that are indicativeof declining cognitive ability
BRIC are not principally software developers, but do provide software to other researchers
• SSI’s work means the software has been reviewed and refactored Looked at exploitation
• Usability review, Naming/trademark review Made it easier for BRIC staff to maintain and develop
• Move to standard repositories, testing and documentation processes• Examination of licencing for MCMxxxVI• Extraction and refactoring to create standalone tools
• http://www.software.ac.uk/who-do-we-work/brain-research-imaging-centre-edinburgh• http://www.bric.ed.ac.uk/
Software Sustainability Institute
www.software.ac.uk
Case Study: Climate Policy Modelling
• CIAS team at Tyndall Centre for Climate Change Research, University of East Anglia Develop linked climate and economic models for detailed
analysis Their software was not ready to be used by other groups
• One researcher/developer at UEA, several users
• SSI’s work means the software is robust enough that it can be installed and used by others Enabled use of the software by the WWFN’s Climascope
project and James Cook University• Documented software to allow extensions by contributors• Made it easier to maintain and backup• Added job scheduling to improve modeling throughput• New modelling framework enables new models i.e. new
science
• http://www.tyndall.ac.uk/research/cias
Software Sustainability Institute
www.software.ac.uk
Case Study: textual studies
• TextVRE team at CeRCH, Kings College London Developed an environment which is used to integrate
various tools used in the e-Humanities textual studies lifecycle
Builds on the German TextGrid project, and many other existing tools
• SSI’s work means the software is can be run “out of the box” – an important requirement for the researchers Developed a VM image containing the TextVRE installation
• Improve installation instructions• Develop tests to check each installed component• Improve modularisation to allow others to contribute and
maintain Feeding back work to TextGrid
• http://textvre.cerch.kcl.ac.uk
Software Sustainability Institute
www.software.ac.uk
The modern researcher…
• … worries about: Data management
and analysis Reproducible
research Scalable simulations Integration of
models and workflows
CollaborationPicture of Otto Stern courtesy of Emilio Segre Visual Archives
Where do they learn how to do this?
Software Sustainability Institute
www.software.ac.ukObservation 1:
Software is pervasive across research
Corollary: software is bleeding edge and long-tail Demanding users are coming from arts + humanities, economics, and social science as well as sciences
Software Sustainability Institute
www.software.ac.uk
Observation 2:A culture of re-use rather than re-invention is not widespread Corollary: we have wasted effort and increased siloing
Software Sustainability Institute
www.software.ac.uk
Observation 3:Many people are “embarrassed” about software
Corollary: something is broken in the way we regard, recognise and reward software
Software Sustainability Institute
www.software.ac.uk
SSI Drivers and Themes
• Two key drivers which cause people to seek the SSI’s advice: They want to be more productive in their research They don’t want to be embarrassed by appearing worse than
their peers
• Broadly, our work falls into a few key themes: The role and reward of software in research Recognition of software career paths Developing the scientific computing / software development
skill base
Software Sustainability Institute
www.software.ac.uk
The Foundations of Digital Research
Software
Software
Software
Re-usable Re-producible
Software Sustainability Institute
www.software.ac.uk
Gap 1: Software Skills Training
Basic Advanced
ProgrammingFocussed
(Tools)
Research
Focussed
(methods)
SoftwareCarpentry
Programming 101
SummerSchools
Advanced HPC Training
HPC Short CoursesDoctoral Training
MSc in HPC / scientific
computing
Programming 201
Who fills this gap?
Software Sustainability Institute
www.software.ac.uk
Software philosophy as part of the process
• Foundations of scientific computing in undergraduate courses Like presentation skills
• Methods of scientific computing in postgraduate courses Like statistics and ethics
• Show the benefits from the knowledge and methods of digital research Not just programming 101
Software Sustainability Institute
www.software.ac.uk
Best Practices for Scientific Computing
1. Write programs for people, not computers2. Automate repetitive tasks3. Use the computer to record history4. Make incremental changes5. Use version control6. Don’t repeat yourself (or others)7. Plan for mistakes8. Optimise software only after it works correctly9. Document the design and purpose of the code, rather than its mechanics10. Conduct code reviews
Paper (including the evidence) being submitted to arXiv and PNAShttp://arxiv.org/abs/1210.0530
Software Sustainability Institute
www.software.ac.uk
Gap 2: Lack of recognition and reward
• There is an anachronism in the way we conduct and recognise research? REF references software as an output but it is still not
easy to get recognition – peer review fails• Software careers
Researchers who use software Researcher-Developers Research Software Engineers Research Software Support Research Systems Providers
Software Sustainability Institute
www.software.ac.uk
No recognition without reward, no reward without reproducibility?
• How do we reward people for important software contributions?
• Traditionally: publish a research paper that happens to mention software Can we provide more direct, acceptable software citations?
• A Research Software Impact Manifesto http://www.software.ac.uk/blog/2011-05-02-publish-or-be-damned-alternat
ive-impact-manifesto-research-software
NB Authorship is hard• It works for data!
C.f. Heather Piowowar’s work http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.
0000308
Software Sustainability Institute
www.software.ac.uk
Software Metapapers
• Create a complete scholarly record including “standard” publication, method, dataset and models, and software e.g. modelling and simulation, statistical analysis Enable replay, reproduction and reuse
• Pragmatic approach is to create a metadata record for the software, and link it to a copy of the software in some storage infrastructure This is a software metapaper Peer-review the metadata, not the software
• Journal of Open Research Software: http://openresearchsoftware.metajnl.com/
See: http://openresearchsoftware.metajnl.com/faq/ and the work by B. Matthews et al: The Significant Properties of Software: A Study
Software Sustainability Institute
www.software.ac.uk
Gap 3: Lack of support infrastructure
• For example: no digital repository which satisfies the criteria: Open to anyone in the UK to archive software Software associated with an OSI license Provide a unique, permanent identifier Publishes a preservation/curation/sustainability
plan• This is just deposit, not even preservation or
sustainability
Software Sustainability Institute
www.software.ac.uk
5 Stars of Software?
• Do we need a 5 stars for software? Existence – there is accurate
metadata that defines the software Availability – you can access and run
the software Openness – the software has an
open permissible license Assured – the software provides
ways of assuring its correctness Linked – the related data,
dependencies and papers are indicated
c.f.5 Stars of Linked Data (Berners-Lee)5 Stars of Online Journals (Shotton)
Software Sustainability Institute
www.software.ac.uk
Gap 4: Software Maturity and Management
Soft
war
e pr
olife
ratio
n
Time
CustomisationInnovation Consolidation
Not all software should make it to the next stageManagement changes through time, requiring planning
Software Sustainability Institute
www.software.ac.uk
A More Manageable Ecosystem
• Discourage duplicative software development in research grants by rewarding reuse and long-term development Need to change perceptions so that software is seen as
valuable But understand when it should not proceed to next stage
• Different stages should be managed and funded separately Maintenance vs. research vs. development
• A skilled researcher base is the key in the digital age Create a larger proportion of enabled researchers and provide
the ramps to go from desktop to high-end infrastructure Allow and encourage specialism and collaboration
Software Sustainability Institute
www.software.ac.uk
Take home points1) Researchers are developing more software than ever, and trying to do it better
2) We are not adequately providing the training, recognition and reward, and career paths to enable a step change improvement in research software3) This is hindering digital research
4) The only people who can change this situation are people like you!
Software Sustainability Institute
www.software.ac.uk
A national facility for cultivating world-class research through software
Become our next collaborators!Website: www.software.ac.ukEmail: [email protected]: twitter.com/SoftwareSaved
Some current collaborations