why do computational scientists trust their so

Download Why Do Computational Scientists Trust Their So

If you can't read please download the document

Upload: jpipitone

Post on 23-Jun-2015

622 views

Category:

Technology


0 download

DESCRIPTION

A very informal talk I gave to Hausi Muller's group at UVic in June 2009. I have included, without permission, slides from Daniel Hook's excellent presentation at SE-CSE 2009 (http://www.cs.ua.edu/~SECSE09/schedule.htm).

TRANSCRIPT

  • 1. Why do climate modellers trust their software? Jon Pipitone Advisor: Steve Easterbrook University of Toronto @Uvic, June 2009

2. This presentation

  • Quick and dirty
  • I'd prefer discussion over me just blabbering

Tell a good story 3. Get feedback from you

  • Approach is good?

4. What am I missing? 5. "If we knew what we were doing, it wouldn't be called research, would it?" - attributed to Albert Einstein 6. What is climate modelling? A kind ofcomputational science 7. What is computational science?

  • A scientific computing approach to gain understanding, mainly through the analysis of mathematical models implemented on computers

Wikipedia, computational science 8. What is computational science?

  • Computers and software are the lab equipment.Virtual laboratories.

9. Program outputs are the results of the experiment. 10. Scientific software development my focus 11. What is climate modelling?

  • Climatologists build computer models of the climate to try to understand climate processes.

12. What is climate modelling? Source: Easterbrook,CUSEC'09 (Source: IPCC AR4, 2007) 13. What is climate modelling? Source: Easterbrook,CUSEC'09 (Source: IPCC AR4, 2007) 14. General Circulation Models Source: Easterbrook,CUSEC'09 Crown Copyright 15. Scientific software development 16. Scientific software development 17. Verification and Validation

  • Desk checking
  • Informal unit test, some use of debuggers

Science Review and Code Review

  • Science review by project managers

18. Code review by designated code owners

  • Continuous testing as Science Experiments
  • Automated test harness on main trunk
  • JP: physical constraints
  • Bit reproducibility (strong constraint)

19. Model-intercomparisons Source: Easterbrook, CUSEC'09 20. Basic Validation Steps

  • Simulate stable climate (with no forcings)
  • Models can produce climates with tiny changes in mean temperature but with seasonal and regional changes that mimic real weather.

Reproduce past climate change

  • When 20th century forcings are added model should match observations.

Reproduce pre-historic climates

  • Can model last ice age and advance of Sahara desert.

Source: Easterbrook, CUSEC'09 21. Validation Notes Source: Easterbrook,CUSEC'09 Crown Copyright 22. Core problems with V&V 23. Validation notes Bit reproducibility Core problems with V&V 24. In other words,

  • This is science.It's difficult to specify concrete requirements beforehand.

25. What does this mean for software quality? 26. (note, we're not talking about model quality!) 27. How do we judge quality in scientific software? 28. Software Quality

  • Software quality is a big concept with many facets, or -ilities
  • e.g. reliability, modularity, customisability,

We're used to thinking of the quality of software as how well it is designed and how well it matches our requirements. 29. Measuring Quality: Defect Density

  • Defect Density = # defects / LOC

30. Can we benchmark quality using defect density?

    • (It is the most common rough quality measure from what I've seen.)

Preliminary observation: defect density for climate models is lower than comparably-sized industrial projects. 31. Hadley defect rates Some comparisons: NASA Space shuttle:0.1 failures/KLOCBest military systems: 5 faults/KLOC Worst military systems: 55 faults/KLOC Apache: 0.5 faults/KLOC XP: 1.4 faults/KLOCHadleys Unified Model: avg of 24 bug fixes per release avg of 50,000 lines edited per release 2 defects / KLOC make it through to released code

    • expected defect density in current version: 24 / 830,000 0.03 faults/KLOC

Source: Easterbrook, CUSEC'09 ? ? 32. Few Defects Post-release

  • Obvious errors:
  • Model wont compile / wont run

33. Model crashes during a run 34. Model runs, but variables drift out of tolerance 35. Runs dont bit-compare (when they should) Subtle errors (model runs appear valid):

  • Model does not simulate the physical processes as intended(e.g. some equations / parameters not correct)

36. The right results for the wrong reasons (e.g. over-tuning) 37. Expected improvement not achieved Source: Easterbrook, CUSEC'09 38. Measuring Quality: Defect Density So,Is climate modelling software really that good? 39. On Benchmarking

  • Comparing defect rates isverysubjective:
    • Ultimately depends on testing strategy
  • 40. When are we counting: pre- or post-release?

41. How do we factor in severity?

  • Pareto law: 20% of the bugs cause 80% of the problems

Bug type: A bug is* not a bug, across projects. 42. No standards in the literature 43. What are we measuring?

  • Absolute values suck, we don't know what they mean
    • Guy from Software Quality workshop at ICSE
  • We don't have an underlying theory of software quality yet
    • i.e. how do all these -ilities relate and correspond to the world?

44. We could ask ...

  • What are the important aspects of quality for computational scientists?

45. We could ask ...

  • What makes a piece of softwaregood ?

46. What makes a piece of softwarebad ? 47. How do you know when you're done? 48. How do you train newcomers? 49. or... 50. When have you had to delay releasing due to a bug?Why? 51. Tell me the story behind these and other bugs... 52. Questions to ask...

  • SW Quality Measurement: A Framework for counting problems and defects(Florac, SEI: TR22.92)

Finding Activity:What activity discovered the problem or defect? Finding Mode:How was the problem or defect found? Problem Type:What is the nature of the problem? If a defect, what kind? Criticality : How critical or severe is the problem or defect? Related Changes:What are the prerequisite changes? .... Why did the bug go unnoticed?Why is it important to have fixed this bug then?How was the bug fixed? Why is the fix appropriate? 53. Why study climate modellers?

  • Socially relevant

54. Already have connections with CM groups 55. Preliminary data suggesting the quality of their models is high:

    • What can we learn from them?
  • 56. What can we teach them?
  • A good example of well-established computational science.

57. My study:

  • Why do climate modellers trust their code?

58. What do climate modellers do when coding to guarantee correctness? 59. What are their notions of quality wrt to code? 60. How can we benchmark computational scientists' code quality? 61. My study:

  • Detailed analysis of defect density
  • Pre- and post-release defect counts

62. Discover through bug reports and version control comments

  • e.g. check-in comments with fixed, bug #, etc..

Defect density over releases (trends) 63. Breakout by defect types (but, what are they?) 64. Maybe: static fault density using automated tool 65. Examine several climate models (>3?) 66. My study:

  • Qualitative investigation
  • Semi-structured interview of climate modellers

67. Use questions given previously to guide conversation 68. Investigate the story of a defect, judgement calls that were made. 69. ~5 defect stories per climate modelling centre 70. Cross-case analysis 71. Outcomes

  • Towards a theory of code quality for climate modelling (computational science?) software
  • Empirical basis

72. Future: relevant quality benchmark Benchmarking statistics for climate modelling code

  • Useful for climate modelling groups

Learn from CS; Where can we can help? 73. Questions?

  • How well did I present the background to the study?

74. objective the study? 75. Issues with the study itself?

  • no direct investigation of code quality, only problems
  • To some extent throughfault analysis

76. What would I look for? Others?