reproducible research in computational science · reproducible research in computational science...

49
REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 1 Reproducible Research in Computational Science IPOL, a Research Journal for Image Processing Algorithms and Software Facultad de Ciencias Fisicas y Matematicas Universidad de Chile Santiago, CL April 15th, 2013 Nicolas Limare CMLA, ENS Cachan, FR/JP Image Processing On Line – IPOL http://www.ipol.im/

Upload: others

Post on 27-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 1

Reproducible Research in Computational ScienceIPOL, a Research Journal for Image Processing Algorithms and Software

Facultad de Ciencias Fisicas y MatematicasUniversidad de ChileSantiago, CL

April 15th, 2013Nicolas Limare

CMLA, ENS Cachan, FR/JPImage Processing On Line – IPOL

http://www.ipol.im/

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 2

A Researcher's Story

Let's do research on...

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 3

A Researcher's Story

Let's do research on... maté

© Alejo2083@wikipedia, ZooFari@wikipedia

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 4

Road to HPMDS (High-Perf Maté Dynamics Simulation)

● Review past research and state of the art theories and methods

● Create new measurement tools, models and simulation software

● Compare with existing works

● Present, publish

● Drink high-performance maté

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 5

Computational Science?

You don't compute an article

© Marcin Wichary

→?

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 6

Tip of the Iceberg

article

© Uwe Kils

datasoftwareparametersfiltersvisualization...

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 7

Ask the Author?

● not availablesecret, lost, unknown version

● not usablebinary-only, not portable, local tools

● not compilableand won't debug thousands obscure lines

● not readable

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 8

Ask the Author?

● not availablesecret, lost, unknown version

● not usablebinary-only, not portable, local tools

● not compilableand won't debug thousands obscure lines

● not readable

Of course!It was never meant to be shared.

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 9

Rewrite?

● Might take some time, days, weeks, months...

● You won't get much credit for this work

● The article doesn't provide all the information

● No way to verify that the implementation is correct

“[...] software is the specification for how the software is supposed to work. Anything less [...] doesn’t really tell you anything about

how it’s ultimately going to behave.And that just makes software really, really hard.”

Douglas Crockford

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 10

No Software

● Can not verify

● Can not reproduce

● Can not compare

● Can not reuse

● Can not extend

● Can not do science

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 11

Beyond Maté Simulations

Sometimes more than a missing code:

● Misleading performance reports and figures

● Clinical trials based on wrong assumptions

● Public policies based on wrong expectations

●David Bailey, “Twelve ways to fool the masses when giving performance results on parallel computers”, Supercomputing Review (1991).●Nicholas Wade, “It May Look Authentic; Here's How to Tell It Isn't”, The New York Times (2006).●Ferric C. Fang, R. Grant Steen, and Arturo Casadevall, “Misconduct accounts for the majority of retracted scientific publications”, PNAS (2012). http://dx.doi.org/10.1073/pnas.1212247109●Kevin R. Coombes, Jing Wang, and Keith A. Baggerly, “Irreproducibility of NCI60 Predictors of Chemotherapy”, http://bioinformatics.mdanderson.org/Supplements/ReproRsch-Chemo/●Bill Chameides, “Climategate Redux”, Scientific American (2010).

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 12

But...

We can makebetter science.

We are tryingwith IPOL.

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 13

Scientific Method

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 14

Scientific Method

1200 ~ 1800Roger Bacon, Francis Bacon,Galileo Galilei, Robert Boyle,René Descartes, …

Science needs to be reproduced.

Research is reproducible if other researchers can independently obtain the same results from the published material.

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 15

Reproducible Research?

● Theoretical scientists share demonstrations

● Experimental scientists share procedures

● Computational scientists (usually) share …no software, no full description, no data

cf. Claerbout 1992, Donoho 1995, Stonned 201X, Vandewalle 201X© Sfoster83@wikipedia, Madprime@wikipedia

Ø

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 16

Reproducible (Computational) Research

1990 ~ …Jon Claerbout, David DonohoSerguei Fomel, Randy Leveque,David Bailey, Victoria Stodden,Juliana Freire, +++…

The science is in the software, data and process.

“An article about computational science in a scientific publication is not the scholarship itself,

it is merely advertising of the scholarship.The actual scholarship is the complete software

development environment and the complete set of instructions which generated the figures.”

D. Donoho

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 17

Tip of the Iceberg, Reloaded

article

© Uwe Kils

datasoftwareparametersfiltersvisualization...

communication

science

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 18

Computation Everywhere

● particle physics● fluid dynamics● econometrics● signal processing● quantum chemistry● LIDAR archeology ● MRI analysis● climate & weather● geophysics● …

© CERN, rreis@flickr, rafael grompone, info-nftk@flickr, mohapj@flickr, mario stefanutti, argonne@flickr

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 19

ScienceCodeManifesto.org

● Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.

● Copyright: The copyright ownership and license of any released source code must be clearly stated.

● Citation: Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications.

● Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition.

● Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication.

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 20

ScienceCodeManifesto.org

● Code: available

● Copyright: clearly stated

● Citation: credit the creators

● Credit: recognition

● Curation: remain available

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 21

Why isn't the Code Available?

● Editors and reviewers don't ask it

● Journals and conferences can't handle it

● Code not ready for public view

● No time and motivation to cleanup and document

● Prevent incorrect use

● Competitive advantage

● Copyright/patent jungle

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22

Why isn't the Code Available?

● Editors and reviewers don't ask it

● Journals and conferences can't handle it

● Code not ready for public view

● No time and motivation to cleanup and document

● Prevent incorrect use

● Competitive advantage

● Copyright/patent jungle

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 23

IPOL

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 24

Publication Dynamics

rare picture of an utopian communitysharing their research code

© freeclipartnow.com

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 25

Publication Dynamics

rare picture of an utopian communitysharing their research code

© freeclipartnow.com

researcher's motivation

vs

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 26

Publication Dynamics

rare picture of an utopian communitysharing their research code

© freeclipartnow.com

researcher's motivation

vs

impactfactor

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 27

IPOL

“IPOL is a research journal of image processing and image analysis. Each article contains a text describing an algorithm and source code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled. IPOL follows the Open Access and Reproducible Research models.”

article=

manuscript + software

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 28

IPOL

“IPOL is a research journal of image processing and image analysis. Each article contains a text describing an algorithm and source code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled. IPOL follows the Open Access and Reproducible Research models.”

article=

manuscript + software(+ demo + archive)

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 29

IPOL

“IPOL is a research journal of image processing and image analysis. Each article contains a text describing an algorithm and source code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled. IPOL follows the Open Access and Reproducible Research models.”

© Jorge Cham

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 30

Publishing Software

For every article, the code

● is reviewed and published under an open license

● can be tested online in real time on free data→ experimentation and verification

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 31

Publishing Software

For every article, the code

● is reviewed and published under an open license

● can be tested online in real time on free data→ experimentation and verification

© kellyhofer@flickr

CODE

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 32

Publishing Software

For every article, the code

● is reviewed and published under an open license

● can be tested online in real time on free data→ experimentation and verification

© jmcknight@flickr

CODE DEMO

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 33

Reviewing Software

Code is reviewed like a manuscript

● manually, by selected reviewers

● must match the description of the algorithm

● should follow quality guidelines→ documented, portable, readable

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 34

Editorial Policies

● IPOL wants to provide reference implementations

● IPOL publishes algorithms, not programscode is here to allow full study of an algorithm

● Articles can describe classic algorithms

● Partnership with SIAM journal for article pairs

+

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 35

IPOL Article: Manuscript+Software

● Manuscript: description and study of an algorithm

● Software: complete and documented implementation

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 36

IPOL Article: +Demo

● Manuscript: description and study of an algorithm

● Software: complete and documented implementation

● Demo: universal www interface, test and explore

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 37

IPOL Article: +Archive

● Manuscript: description and study of an algorithm

● Software: complete and documented implementation

● Demo: universal www interface, test and explore

● Archive: shared test data

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 38

Factoids

● Not a prototype, publishing since 2010

● Research journal, self-publishedISSN, DOI, editorial policy and int'l board, indexed

● 40 articles published with code and demo since 201125 articles under review, 10+ public preprints

● 100+ citations (cf. Google Scholar)

2012

● 125.000 visits

● 13.000 code/data downloads

● 50.000 demo runs, 30.000 on original data (archived)

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 39

Results

● Reference implementations of algorithms

● Verifiable claims on performances and results

● Algorithms described and analyzed

● Algorithms improved by review and open tests

● Code improved by review

● Useful for the community

● More than reproducible. Reusable and open.

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 40

Challenges

● Substantial effort to prepare good code→ templates? software carpentry sessions?

● Non-trivial demo management→ team-up with web and visualization experts?

● Small community→ more communication and spin-off to other research areas (next: audio)

● Reusable is more complex than reproducible→ software project derived from IPOL

● Conservative community habits→ must learn to cite software, article ≠ PDF

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 41

Reproducible Research Ecosystem

Journals

● Science requires that all data and code is available to any reader

● Math Programming Computation requires the code

● Biostatistics stamps reproducible articles

● JMLR publishes software

● Geophysics has some software guidelines

● Source Code for Biology and Medicine publishes software,Journal of Open Research Software will too

● Computing in Science and Engineering reviews software

● MetaJournals publish articles about software and data

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 42

Reproducible Research Ecosystem

Journals

Publishers

● SIAM updated its supp. material policies to include software

● ACM reformed its supp. material copyright policy

● Elsevier experiments with “executable papers” and “post-PDF”

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 43

Reproducible Research Ecosystem

Journals

Publishers

Tools and Services

● RunMyCode hosts executable research software

● FLOSShub, mloss/mldata host software

● DataDryad, Figshare host data

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 44

Reproducible Research Ecosystem

Journals

Publishers

Tools and Services

Conferences and Workshops

● ICERM Workshop Dec. 2012

● SINTEF Winter School Jan. 2013

● SIAM CiSE13 Conference track Feb. 2013

● NYU Workshops May 2013

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 45

Reproducible Research Ecosystem

Journals

Publishers

Tools and Services

Conferences and Workshops

Open <anything>

● Open Access

● Open Data, Open Science

● Open Source

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 46

Collaboration

Work funded by and in collaboration with

IPOL wouldn't be possible without the support and trust of

the authors, reviewers and editors who contributed.

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 47

Join the Dance

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 48

Join the Dance

new research groups in the project

new tools, policies and projects on similar issues

new users/readers, authors, reviewers, editors

new ideas

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 49

Follow-up to...

http://www.ipol.im/[email protected]@list.ipol.im @IPOL_journal

http://nicolas.limare.net/[email protected] @NicolasLimare

links & references- http://stodden.net/- http://reproducibleresearch.net/