scott edmunds, recon 2015: beyond dead trees, publishing digital research objects

40
0000-0001-6444- 1436 @SCEdmunds [email protected] Scott Edmunds Beyond dEad trees publishing digital research objects

Upload: gigascience-bgi-hong-kong

Post on 03-Aug-2015

406 views

Category:

Science


4 download

TRANSCRIPT

Page 1: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

0000-0001-6444-1436

@SCEdmunds

[email protected] Edmunds

Beyond dEad trees

publishing digital research objects

Page 2: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Need to move beyond 350 year old incentive systems

Buckheit & Donoho: Scholarly articles are merely advertisement of scholarship. The actual scholarly artifacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible.

Page 3: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

JIFBAIT Network

more

GWASGWAS

JIFBAIT NEWS

Arsenic Life forms, will they take over the planet?

By Melba Ketchum, PhD

Which Overhyped, Unreproducible Experiment Are You?

Want rapid citations for 2 years only? Carry out this quiz.

You got: STAP CellsOf course dipping cells in coffee will make them pluripotent. Even if the research gets discredited, it’ll still get 100’s of citations in two years.

Page 4: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

The end result….Attempts to “game the peer-review system on an industrial scale”

1. dx.doi.org/10.1087/201102032. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/3. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-inc

entives-curb-research

Companies offering authorship of papers made to order by “paper mills”. Meta-analyses, network analysis & more.

Guaranteed publication in JIF journal, often using fake referees, ID theft, etc.

Page 5: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Consequences: increasing number of retractions>15X increase in last decade

At current % > by 2045 as many papers published as retracted

1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

Page 6: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

STAP paper demonstrates problems:

Nature Editorial, 2nd July 2014:

“We have concluded that we and the referees could not have detected the problems that fatally undermined the papers. The referees’ rigorous reports quite rightly took on trust what was presented in the papers.”

http://www.nature.com/news/stap-retracted-1.15488

Page 7: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

STAP paper demonstrates problems:

…to publish protocols BEFORE analysis…better access to supporting data…more transparent & accountable review

…to publish replication studies

Need:

Page 8: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

• Review• Data• Software• Models• Pipelines• Re-use…

= Credit

}

Credit where credit is overdue:“One option would be to provide researchers who release data to public repositories with a means of accreditation.”“An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “Nature Biotechnology 27, 579 (2009)

New incentives/credit

Page 9: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

GigaSolution: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Utilizes big-data infrastructure and expertise from:

Combines and integrates (with DOIs):Open-access journal

Data Publishing Platform

Data Analysis Platform

Open Review Platform

Page 10: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Things we need to reward

Page 11: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

1. Reward Open Data

Page 12: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Data Publishing: nothing new…

Data & Metadata Collection/Experiments

Analysis/Hypothesis/Analysis

Conclusions

+ Area of Interest/Question

1839

1859

20 Yrs.

Page 13: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Data Publishing: Can be Life or Death

Climate change, global hunger, pollution, cancer, disease outbreaks…

http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966

Page 14: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

To maximize its utility to the research community and aid those  fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001

Our first DOI:

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Page 15: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects
Page 16: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects
Page 17: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects
Page 18: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Downstream consequences:

“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.”

1. Citations (~300) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons

4. Example for faster & more open science

Page 19: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

1.3 The power of intelligently open dataThe benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro-intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin–producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open-source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.

Page 20: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

second

1. http://www.gigasciencejournal.com/content/3/1/222. http://minotour.nottingham.ac.uk/ 3. https://github.com/lexnederbragt/INF-BIOx121_fall2014_de_novo_assembly

1st Nanopore MinION E. Coli genome released via GigaDB 10th September 2014 (>125GB)

Data Note peer reviewed & published 20th October1

Immediately used for teaching materials2 & real-time tools3

Page 21: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

second

Real time sequencing era needs real time publication!

• First nanopore clinical amplicon sequencing paper (& data) published March 2015

• Can determine virus/bacteria strains in hours

• Already in use tackling Ebola in West Africa

• “Living internet of things”

http://www.gigasciencejournal.com/content/4/1/12

Page 22: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

IRRI GALAXYRice 3K project: 3,000 rice genomes, 13.4TB public data

Feed The World With (Big) Data

Page 23: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

OMERO: providing access to imaging data

Already used by JCB.

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

Need for better handling of imaging data

Page 24: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

The alternative...

...look but don't touch

Need for better handling of imaging data

Page 25: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

2. Reward Open DataExecutable

Page 26: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Methods

Answer

Metadata

softwareAnalysis

(Pipelines)

Workflows/Environments

Idea

Study

Rewarding the

DOI, etc.Publication

Publication

Publication

Data

Page 27: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Software

https://github.com/gigascience

Transparent

Open & able to build upon

Taking citeable snapshots

@jeejkang

Page 28: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

gigagalaxy.net

WorkflowsReward Sharing of Workflows

Page 29: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Visualisations & DOIs for workflows

http://www.gigasciencejournal.com/series/Galaxy 29

Page 30: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Facilitate reproducibility, reuse & sharing & publish outputs of: Knitr, Sweave, Jupyter/iPython Notebook, etc.

Open DocumentsReward Open/Dynamic Workbooks

Page 33: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

E.g.

http://www.gigasciencejournal.com/content/3/1/3

Reviewer (Christophe Pouzat): “It took me a couple of hours to get the data, the few custom developed routines, the “vignette” and to REPRODUCE EXACTLY the analysis presented in the manuscript. With few more hours, I was able to modify the authors’ code to change their Fig. 4. In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewer’s job much more fun!

Page 34: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

http://www.gigasciencejournal.com/content/3/1/23http://www.gigasciencejournal.com/content/4/1/19

Virtual Machines

• Downloadable as virtual harddisk/available as Amazon Machine Image

• Experimenting & reviewing container (docker) submissions

Page 35: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Taking a microscope to the publication process

Page 37: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Lessons Learned• Is possible to push button(s) & recreate a result from

a paper

• Most published research findings are false. Or at least have errors

• Reproducibility is COSTLY. How much are you willing to spend?

• Much easier to do this before rather than after publication

Page 38: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

The cost of staying with the status quo?

• Ioannidis estimate that 85% of research resources are wasted.

• ~US$28B year unnecessarily spent on preclinical research in US.

• Each retraction estimated to cost $400,000.http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001747http://elifesciences.org/content/3/e02956http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165

Page 39: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Death to the Publication. Long live the Research Object!

Manifesto for a reproducible publisher:

The era of the 1665-style publication is over

Reward replication not advertising

Credit FAIR data, not JIF-bait narrative

Granularity ≠ salami slicing. Ingelfinger is the enemy

We need a recognizable mark/badge/score(s) for replication

Separate category in ORCID for actually usable things

?

Page 40: Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Objects

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.biomedcentral.com/gigablog/

Peter LiChris HunterJesse Si ZheRob DavidsonNicole NogoyLaurie GoodmanAmye Kenall (BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford)

www.gigadb.orggigagalaxy.net

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team: (Case study)

40