Open sharing and maintenance of scientific code
Jordan S Read; Luke A Winslow2013-08-20
Background
• Who I am– USGS-CIDA– 2012 PhD in physical
limnology (UW-Madison)– Civil Engineer
• My experience with code and model development– Lake Analyzer– CLM– rGDP; rGLM– Numerous collaborations
Background
My philosophy on science code:“Code created for the pursuit of science questions should be open, accessible, and designed to enable others to build from”
• Kind of like your scientific publications, right?• That means I shouldn’t be able to build my scientific
livelihood around a piece of “black-box” code
Background
• My responsibility as a member of the science community:
“Methods used to obtain published results should be clear, transparent and
repeatable”
• My responsibility as a federal employee:“Provide public access to all elements of publicly funded research”
Road map
Part I• My experiences with
science code development
• Motivation to open up your scientific code
Part II• Maintaining and
modifying code• Code collaboration
Lake Analyzer
• GLEON background– Hanson & Hamilton collaboration and student
exchange– Physics & Climate working group
• Requirements– Easy to use– Provide access to complex physical derivatives– Handle dataset irregularities• Errors, gaps, intermittent sampling frequencies, etc.
– Rapid processing of large datasets
Lake Analyzer
• I took on the role of primary coder– Why? GLEON had paid my travel to two
meetings…including NZ!• I did the work in MATLAB, because that is
what I was most familiar with• Side project during grad school• Built from feedback from GLEON physics &
climate group
Lake Analyzer
Lake Analyzer
• Repeatable – .lke file ~ metadata
• Visualizations (plotting options for outputs)
• Easy to use
Lake Analyzer
• Software publication
Lake Analyzer
• Software publication
• Open codebase
• Software publication
• Open codebase
• Platform/language independence
Lake Analyzer
Lake Analyzer
• Software publication
• Open codebase
• Platform/language independence
• Useful and citable
19 citations in ~20 months
Opening up scientific code
• Publishing your code– Would a simple paper of physical derivations be cited at
this rate?– Would a methods paper be as popular if the code
wasn’t available/open?– Additional motivation for creation of code
• Writing open code– More use– Ease of collaboration– Integrity/transparency
Opening up scientific code
• Reasons many choose not to open code– Too much work– Code is too messy– Potential for criticism– Code as scientific livelihood– Has known errors…– Others?
Opening up scientific code
• When to put in the effort– Collaborations– When you are doing it “right”– When you will use it in the future– When you are publishing something– When you have to– Others?
Part II: Maintaining code
So…the code works, what’s next?• How do I take risks with code?– i.e., changing the way a function works– What if I make a mistake? (undo+undo+undo…?)
• How do multiple people collaborate on a single set of scripts? – In serial?– Google docs vs word for writing a paper
Maintaining code
• Risky modifications– Metabolism_modelv28.R?– Metabolism_model_NEW.R?– Metabolism_model_NEWsecondTRY.R?– Metabolism_model_NEWEST.R?
Maintaining code
• When we publish, we use track changes– Can we do the same for code?
• Version management– AKA: version control, revision control, source control– How it works– Why you should know what it means– Benefits to using version management
• Historical record of code evolution• Easy to “roll back” to previous working version• The code has only one home
Maintaining code
How it works– Creates a “life history of code”
Hey, nice sweaterThanks. I travel a
lot. Want to start a project?
Sure! I have some modeling code So do I! Let’s
combine our efforts
Maintaining code
How it works– Creates a “life history of code”
Maintaining code
1 2
Here is a new set of methods
Maintaining code
1 2 3
I made some improvements
Maintaining code
1 2 3 4
Whoops! Fixed a bug
Conclusions
• Code as if it will be seen and used by others– You may be that “other” in 3 years
• Decide if creating publicly usable code makes sense for your research
• Make your code accessible to collaborators• Consider the concepts imbedded in version
management
Jordan S ReadUSGS Center for Integrated Data Analytics608-821-3922 | [email protected]
Questions?
Thanks GLEON FP & TLS!