infrastructure for communicating data-intensive science

29
infrastructure for communicating data-intensive science brian m. bot | senior scientist | community manager | sage bionetworks clear Science

Upload: brian-bot

Post on 23-Jan-2017

204 views

Category:

Science


4 download

TRANSCRIPT

infrastructure for communicating data-intensive science

brian m. bot | senior scientist | community manager | sage bionetworks

clearScience

a non-profit organization which pilots a variety of components that are necessary to build a scientific research “commons”

why?

Sage Bionetworks

“We Must Guard Against the acquisition of unwarranted influence,

whether sought or unsought, by the Military Industrial Complex”

- Dwight D. Eisenhower 1961 Medical

not conducive for a ‘commons’

institutional incrementalism

individual tenure

proprietary shortsighted solutions

not conducive for a ‘commons’

commonsenabling a

open data

accessible platform

clear communication

“The problem is that right now, it’s not easy to donate your data to health research.”

“The goal of Consent to Research is to play a part in the transformation of health from

something we experience passively to something we

experience actively.”

http://weconsent.usJohn Wilbanks, Chief Commons Officer

open data

open data

accessible platform

clear communication

commonsenabling a

accessible platform

a collaborative compute space that allows scientists to share and analyze

data together

open data

accessible platform

clear communication

commonsenabling a

clear communication

Deception at Duke

research scandals represent merely the extreme of a continuum in the culture of academic research

the status quo tolerates poor communication of findings

6%

21%

8%

11%

54%cannot reproduce

can reproduce in principle

can reproduce w/discrepancies

can reproduce from processed data w/discrepancies

can reproduce partially

Ioannidis A. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149-155 (2009) | doi:10.1038/ng.295

208,294,724 datapoints

124 pages supplemental material

?? lines unobtainable source code

?? version or architecture of statistical analysis program (R)

enumerable R packages and package dependencies

key R package “ClaNC” no longer available

442 citations

often what is in principle reproducible, is not practically reproducible

unidentified publication‣ from journal with 5 year impact factor of 28‣ article freely available for download‣ data freely available for download

how are we to move science forward

if we cannot understand what was done previously?

let’s go back to basics

4. test hypothesis experimentally

5. analyze experimental data

7. publish results

6. draw conclusions based on data

scientific method1. define a question

2. gather information and resources (background research)

3. form a hypothesis

8. retest (frequently done by other scientists)

4. test hypothesis experimentally

5. analyze experimental data

7. publish results

6. draw conclusions based on data

7. publish results

finitein

∞...

submit to journal

analyze on local machine

write a documentsent to reviewers as pdf

printed on paper

static html representation

experimentally generate data

accepted & digitally typeset

static pdf representation

store on local server

are being artificially uncoupled from

scientific claims

science itself

clearSciencere-imagining scientific communication

allow consumption of content at a variety of levels of complexity

and abstraction

leverage Synapse RESTful APIs

clearScienceallow consumption of content at a

variety of levels of complexity and abstraction

“hand the keys over” to the reviewers

scientific communicationneeds to evolve

along with scienceneeds to evolve

“Scientists often study the past as obsessively as historians because few

other professions depend so acutely on it. Every experiment is a conversation with

a prior experiment, every new theory a refutation of the old”

-Siddhartha Mukherjee, The Emperor of All Maladies

AcknowledgementsSage Bionetworks

David Burdick - Senior Software Engineer

Stephen Friend - President and CEO

Erich S. Huang - Director of Cancer Research

Michael Kellen - Director of Technology

External Partners

Myles Axton - Nature Genetics

Phil Bourne - PLoS Computational Biology

Josh Greenberg - Alfred P. Sloan Foundation

Kelly LaMarco - Science Translational Medicine

Eric Schadt - Mount Sinai School of Medicine