a recipe for sustainable software
DESCRIPTION
Presented at the Workshop for Sustaonable Software for Science: Practice and Experiment (WSSSPE). Part of Supercomputing 2013 (SC13) in denver Colorado.TRANSCRIPT
WSSSPE 1
A Recipe for Sustainable Software
Philip E. BourneUniversity of California San Diego
11/17/13
http://www.carlmason-liebenberg.com/raw-chocolate-mousse-recipe/
WSSSPE 2
Outline
• My Perspective/Bias• Motivation• Experiences providing ingredients to the
recipe:– The role of journals– The role of institutions– The role of the community– The role of funders– A business model
11/17/13
WSSSPE 3
My Perspective/Bias
• Basic scientist in the biomedical sciences• Not coded anything for years• Built computing infrastructure• Manage software project teams of ~10 people• Formed 4 software-based companies• 15 years with a community resource – PDB• Helped to establish communities – PLOS, FORCE11,
DELSA, NIF• University Administrator• Journal co-founder
11/17/13
WSSSPE 4
Motivation – The Good News
• Those iconic DNA and protein representations were drawn by hand
• Molecular graphics emerged to automate this process
• Today cell contents are drawn by hand
• Automating that conceptualization is is just one next step
11/17/13
We are at the beginning of what software will bring to the life sciences
WSSSPE 5
Motivation – The Bad News
11/17/13
WSSSPE 6
Thinking on Software back in 2008..
• Costs too much• Is located in silos• Does not foster reproducibility• Is poorly maintained – is unsustainable• Does not meet the needs of 21st century
biology• Is a major time waster
Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 . 4(7): e1000136
11/17/13
WSSSPE 7
What Got Me Thinking More
• Software development in science has improved thanks to open source, github etc. but for the most part remains arcane
• Software (and data) atrophy is a problem
• There is much we can learn from the app model– Consistent user interface – intuitive– Common calling interface– App store – ratings commentary etc.
11/17/13
WSSSPE 8
The Protein Data Bank (PDB)
• My Perspective/Bias• Motivation• Experiences providing ingredients to the
recipe:– The role of journals– The role of institutions– The role of the community– The role of funders– A business model
11/17/13
The Protein Data Bank (PDB)
• The single community owned worldwide repository containing structures of publically accessible biological macromolecules
• A resource used by ~ 300,000 individuals per month
• A resource distributing worldwide the equivalent to ¼ the National Library of Congress each month
• A bicoastal resource• 1TB
11/17/13 WSSSPE
WSSSPE 10
PDB: Looking Back Over the Past 15 Years – In General
• Everything was harder and took longer than we thought
• There are a lot of politics associated with data and software
• Emphasis has shifted from archive to + analytical tool to + educational tool
• Consequently outreach is our most important yet least understood activity today
• Staff needed to change accordingly• It has become a worldwide enterprise• Prorated our budget has decreased 11/17/13
11
PDB: Looking Back Over the Past 15 Years – Infrastructure
• It took about 5 years to achieve and subsequently sustain 99.99% uptime
• We have gone through 3 distinct code refreshes another is needed– Object model / Perl CGI– Enterprise Java– Code rewrite Enterprise Java
11/17/13 WSSSPE
Bluhm et al. 2011 Quality Assurance doi: 10.1093/database/bar003
12
PDB: Looking Back Over the Past 15 Years – Open Source
• Only considered in the past 7 years or so
• Had “PDB in a Box” but abandoned that
• Now new components are made available through biojava and github
• Don’t really use community contributions enough
11/17/13 WSSSPE
13
PDB: Trends Today
• Constant demand for better
performance
• Use of Web services increasing
• Widgets have not taken off
• Mobile use is increasing fast
• PDB 2.0 services are in demand
11/17/13 WSSSPE
WSSSPE 14
PDBMobile
• Fast, low bandwidth data access
• iPhone in production ~ 10,000 users
• Android in beta
• HTML 5-based web application
• Client-side database stores data for
offline-access
• Tight integration with MyPDB
Objective: PDB Data Access On-The-Go
11/17/13
WSSSPE 15
PDB Sustainability
• Its easier when the data are seen as vital to the scientific enterprise
• Quality breeds trust which breeds support• The community must be involved in every
major decision• Different people/skills are needed at different
time points• The Google bus is inevitable – make allowances
for it11/17/13
WSSSPE 16
Sustainability Through the Private Sector
• My Perspective/Bias• Motivation• Experiences providing ingredients to the
recipe:– The role of journals– The role of institutions– The role of the community– The role of funders– A business model
11/17/13
WSSSPE 17
Founded 4 Companies
• ViSoft Inc.
• Protein Vision Inc.
• Film Frontiers
• SciVee Inc.
11/17/13
WSSSPE 18
Sustainability Through Companies
• Making a business from scientific software alone is very rare – founders tend to overvalue everything; customers undervalue
• Be at the right place on the technology adoption curve
• Need to provide value add – either through content (again rare for science) or services – increasingly likely but needs a special skill set
• TTOs do not understand the value (or lack) of scientific software – be prepared
11/17/13
WSSSPE 19
Journals & Sustainability
• My Perspective/Bias• Motivation• Experiences providing ingredients to the
recipe:– The role of journals– The role of institutions– The role of the community– The role of funders– A business model
11/17/13
WSSSPE 20
The Role of Journals
• Journals can help elevate the value of
software and software developers
• However, it propagates a broken reward
system
• Provide quality control through peer review
• Provide copy of record
11/17/13
WSSSPE 21
Example: PLOS Computational Biology Software Articles - Requirements
• Outstanding open source software of exceptional importance that has been shown to provide new biological insights, either as a part of the software article, or published elsewhere.
• The software must already be widely adopted, or have the promise of wide adoption by a broad community of users.
• No enhancements published • The software must be downloadable anonymously in
source code form and licensed under an OSI license• Must be documented and testable• Presubmission determines suitability11/17/13
WSSSPE 22
The PLOS/Mozilla Experiment
11/17/13
WSSSPE 23
The PLOS/Mozilla Experiment
• How much scientific software can be reviewed by non-specialists, and how often is domain expertise required?
• How much effort does this take compared to reviews of other kinds of software, and to reviews of papers themselves?
• How useful do scientists find these reviews?
11/17/13
WSSSPE 24
Institutions Can Sustain Developers and Software
• My Perspective/Bias• Motivation• Experiences providing ingredients to the
recipe:– The role of journals– The role of institutions– The role of the community– The role of funders– A business model
11/17/13
WSSSPE 25
University 2.0 Is Yet to Happen – Demand Appears to be There
11/17/13
WSSSPE 26
Institutions Underrate Software as Scholarship, But There is a Glimmer of Hope – But You Must Do Your Bit
11/17/13
PLoS Comp. Biol. 7(1) e1002001
WSSSPE 27
Your Responsibility for Software as Scholarship
• Make it easy for software developers to quantify the use and perceived value of software
• Explain to reviewers who do not understand the value the impact you have had
• Software is frequently more valuable that a research article – don’t hide that
• Make clear the costs and sustainability issues to institutions
11/17/13
WSSSPE 28
The Academic Institutions Responsibility for Software as Scholarship
• Accept alternative metrics • Encourage individual departments to put
forward promotion files that reflect the value of software to that domain
• Educate the committee on academic promotions
11/17/13
WSSSPE 29
Funders & Sustainability
• My Perspective/Bias• Motivation• Experiences providing ingredients to the
recipe:– The role of journals– The role of institutions– The role of the community– The role of funders– A business model
11/17/13
WSSSPE 30
NIH As An Example
11/17/13
http://acd.od.nih.gov/Data%20and%20Informatics%20Working%20Group%20Report.pdf
WSSSPE 31
NIH As An Example
11/17/13
WSSSPE 3211/17/13
WSSSPE 33
Features of the Software Catalog (Maybe)
• Driven by the community• Registration service• Rating service• Discovery service• Long term sustainability?
11/17/13
WSSSPE 34
The Role of Funders
• There needs to be more agency cross-talk – both national and international
• Funders can help train institutions not just individuals
• Better specification of the software enterprise• Less build it and they will come – more grass
roots application driven support but managed
11/17/13
WSSSPE 35
The 3D Virtual Cell & FORCE11 Communities
• My Perspective/Bias• Motivation• Experiences driving ingredients to the recipe
– The role of journals– The role of institutions– The role of the community– The role of funders– A business model
11/17/13
WSSSPE 3611/17/13
WSSSPE 37
Sustainability Lessons from the 3D Virtual Cell
• There remains a minimal requirement for funding even with a vibrant community – How?
• Communities still need champions & a vision• Self organization is not an option• Members must like each other – face to face is
needed
11/17/13
WSSSPE 38
Acknowledgements
• Stephanie Hagstrom• The PDB Team• The FORCE11 Team• The PLOS Team• The 3DVC Community
11/17/13