reproducible research in computational science · 4/11/2013  · reproducible research in...

37
REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 1 Reproducible Research in Computational Science IPOL, a Research Journal for Image Processing Algorithms and Software Facultad de Ingeniería Universidad de la República Montevideo, UY, April 11th, 2013 Nicolas Limare CMLA, ENS Cachan, FR Image Processing On Line – IPOL http://www.ipol.im/

Upload: others

Post on 22-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 1

Reproducible Research in Computational ScienceIPOL, a Research Journal for Image Processing Algorithms and Software

Facultad de IngenieríaUniversidad de la RepúblicaMontevideo, UY, April 11th, 2013 Nicolas Limare

CMLA, ENS Cachan, FRImage Processing On Line – IPOL

http://www.ipol.im/

Page 2: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 2

A Researcher's Story

Let's do research on...

Page 3: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 3

A Researcher's Story

Let's do research on... maté

© Alejo2083@wikipedia, ZooFari@wikipedia

Page 4: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 4

Road to HPMDS (High-Perf Maté Dynamics Simulation)

● Review past research and state of the art theories and methods

● Create new measurement tools, models and simulation software

● Compare with existing works

● Present, publish

● Drink high-performance maté

Page 5: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 5

Error 404

How do you compute?

© Marcin Wichary

→?

Page 6: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 6

Ask the author?

● Code not available● secret, lost

● Code not usable● binary-only, not for your OS, obsolete

● Code not compilable● won't debug 2000 obscure lines

● Code not meant to be read by others

● Not the exact version used for the article

Page 7: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 7

Rewrite?

● Might take some time, days, weeks, months...

● You won't get much credit for this work

● Everything is not explained in the article

● No way to verify that the implementation is correct

“[...] software is the specification for how the software is supposed to work. Anything less [...] doesn’t really tell you anything about

how it’s ultimately going to behave.And that just makes software really, really hard.”

Douglas Crockford

Page 8: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 8

No Software

● Can not verify

● Can not reproduce

● Can not compare

→● Can not reuse

● Can not extend

● Can not do science

Page 9: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 9

Beyond Maté Simulations

Sometimes more than a missing code:

● Misleading performance reports

● Manipulated figures

● 2000 retractions in biomedical, 43% for fraud

● Clinical trials based on wrong assumptions

● Climategate

● Public policies based on wrong expectations

●David Bailey, “Twelve ways to fool the masses when giving performance results on parallel computers”, Supercomputing Review (1991).●Nicholas Wade, “It May Look Authentic; Here's How to Tell It Isn't”, The New York Times (2006).●Ferric C. Fang, R. Grant Steen, and Arturo Casadevall, “Misconduct accounts for the majority of retracted scientific publications”, PNAS (2012). http://dx.doi.org/10.1073/pnas.1212247109●Kevin R. Coombes, Jing Wang, and Keith A. Baggerly, “Irreproducibility of NCI60 Predictors of Chemotherapy”, http://bioinformatics.mdanderson.org/Supplements/ReproRsch-Chemo/●Bill Chameides, “Climategate Redux”, Scientific American (2010).

Page 10: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 10

But...

We can makebetter science.

We are tryingwith IPOL.

Page 11: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 11

Scientific Method

Page 12: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 12

Scientific Method

1200 ~ 1800Roger Bacon, Francis Bacon,Galileo Galilei, Robert Boyle,René Descartes, …

Science needs to be reproduced.

Page 13: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 13

Reproducible Research?

Research is reproducible if other researchers can independently obtain the same results from the published material.

● Theoretical scientists share demonstrations

● Experimental scientists share procedures

● Computational scientists (usually) share …no software, no full description, no data

cf. Claerbout 1992, Donoho 1995, Stonned 201X, Vandewalle 201X© Sfoster83@wikipedia, Madprime@wikipedia

Ø

Page 14: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 14

Reproducible (Computational) Research

1990 ~ …Jon Claerbout, David DonohoSerguei Fomel, Randy Leveque,David Bailey, Victoria Stodden,Juliana Freire, …

The science is in the software, data and process.

“An article about computational science in a scientific publication is not the scholarship itself,

it is merely advertising of the scholarship.The actual scholarship is the complete software

development environment and the complete set of instructions which generated the figures.”

D. Donoho

Page 15: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 15

Computation Everywhere

● particle physics● fluid dynamics● econometrics● signal processing● quantum chemistry● LIDAR archeology ● MRI analysis● climate & weather● geophysics● …

© CERN, rreis@flickr, rafael grompone, info-nftk@flickr, mohapj@flickr, mario stefanutti, argonne@flickr

Page 16: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 16

ScienceCodeManifesto.org

● Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.

● Copyright: The copyright ownership and license of any released source code must be clearly stated.

● Citation: Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications.

● Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition.

● Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication.

Page 17: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 17

Why not Share the Code?

● Code not ready for public viewno time/motivation to cleanup, simplify and document

● Prevent Incorrect Usedocumentation and explanations again

● Keep competitive advantage… better not publish at all?

Page 18: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 18

Revisit Objectives of Publishing Articles

vs

ImpactFactor

rare picture of an utopian community in the act of sharing their research code

KEY: lure researchers into sharing their code

© freeclipartnow.com

Page 19: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 19

Revisit Objectives of Publishing Articles

Step 1make the code a publication

by itself

Researcher Pub

l ishA

rt ic leCi te

Pub

l ishC

odeC

i te

traditional research articles source code

Community

Researcher

Community

Page 20: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 20

Revisit Objectives of Publishing Articles

Step 2guide the community to use

and cite the code

Researcher Pub

l ishA

rt ic leCi te

Pub

l ishC

odeC

i te

traditional research articles source code

Community

Researcher

Community

Page 21: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 21

IPOL

Page 22: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22

IPOL

“IPOL is a research journal of image processing and image analysis. Each article contains a text describing an algorithm and source code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled. IPOL follows the Open Access and Reproducible Research models.”

article=

manuscript + software

Page 23: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 23

IPOL

“IPOL is a research journal of image processing and image analysis. Each article contains a text describing an algorithm and source code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled. IPOL follows the Open Access and Reproducible Research models.”

article=

manuscript + software(+ demo + archive)

Page 24: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 24

Page 25: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 25

Publishing Software

IPOL wants to provide reference implementations of image processing algorithms.

For every article, the implementation

● is reviewed and published under GPL/BSD license

● can be tested online in real time on free data

Everything is online, free, reusable.

http://ipol.im/

Page 26: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 26

Reviewing Software

Software is reviewed like a manuscript:

● manually, by selected reviewers

● must match the description of the algorithm

● follows editorial guidelines for correctness, portability, documentation, style

This is already a lot asked to image processing researchers.

Page 27: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 27

IPOL

● Not a prototype, publishing since 2010

● Research journal, self-publishedISSN, DOI, editorial policy and int'l board, indexed

● Partnership with SIAM journal for dual articles

+

● IPOL publishes algorithms, not software; code is here to provide all details to study the algorithm

● IPOL exists because we need it and no other journal did it

Page 28: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 28

Reproducible Research Initiatives

Journals● Science requires that all data and code is available to any reader

● Math Programming Computation requires the code

● Biostatistics stamps reproducible articles

● JMLR publishes software

● Geophysics has some software guidelines

● Source Code for Biology and Medicine publishes software,Journal of Open Research Software will too

● Computing in Science and Engineering reviews software

● MetaJournals publish articles about software and data

Page 29: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 29

More Reproducible Research Initiatives

Publishers● SIAM updated its supp. material policies to include software

● ACM reformed its supp. material copyright policy

● Elsevier experiments with “executable papers” and “post-PDF”

Tools and Services● RunMyCode hosts executable research software

● FLOSShub, mloss/mldata host software

● DataDryad, Figshare host data

Conferences and Workshops● ICERM Workshop Dec. 2012

● SINTEF Winter School Jan. 2013

● SIAM CiSE13 Conference track Feb. 2013

● NYU Workshops May 2013

Page 30: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 30

IPOL Article: Manuscript+Software

● Manuscript: description and study of an algorithm

● Software: complete and documented implementation

Page 31: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 31

IPOL Article: +Demo

● Manuscript: description and study of an algorithm

● Software: complete and documented implementation

● Demo: universal www interface, test and explore

Page 32: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 32

IPOL Article: +Archive

● Manuscript: description and study of an algorithm

● Software: complete and documented implementation

● Demo: universal www interface, test and explore

● Archive: shared test data

Page 33: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 33

Activity

● 40 articles published with code and demo since 201125 articles under review, 10+ public preprints

● 100+ citations (cf. Google Scholar)

2012

● 125.000 visits

● 13.000 code/data downloads

● 50.000 demo runs, 30.000 on original data (archived)

Page 34: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 34

Results

● Reference implementations of algorithms

● Verifiable claims on performances and results

● Algorithms described and analyzed

● Algorithms improved by mass-testing

● Implementations improved by review

● More than reproducible. Reusable and open.

Page 35: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 35

Challenges

● Still the work of a small community→ join and spread the word→ spin-off to other research areas (next: audio)

● Competition from less stringent journals and conferences→ they can evolve by peer-review pressure

● Reusable is more complex than reproducible→ software project derived from IPOL

● Conservative community habits→ must learn to cite software, article ≠ PDF

● Substantial effort to prepare good code→ computation at center of computational sciences cursus→ templates? other ideas?

Page 36: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 36

Collaboration

Work funded by and in collaboration with

New participants are welcome!

Page 37: Reproducible Research in Computational Science · 4/11/2013  · REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 22 IPOL “IPOL is a research journal of image processing and image

REPRODUCIBLE RESEARCH IN COMPUTATIONAL SCIENCE — 37

Follow-up to...

http://ipol.im/[email protected]@list.ipol.im @IPOL_journal

and also…► http://stodden.net/► http://reproducibleresearch.net/► http://www.runmycode.org/

interested in► more authors, editors, reviewers, readers and users► productive relations with new researchers► assistance and collaboration to new similar projects