dream8.5: crowdsourcing computational challenges to ... · innovation • loose affinity network of...
TRANSCRIPT
DREAM8.5: Crowdsourcing Computational Challenges to Accelerate Medical Solutions
A FasterCures Webinar
March 26, 2014
action FasterCures is an “action tank” driven by a singular goal – to save lives by speeding up and improving the medical research system. A center of the Milken Institute, we are a nonprofit and nonpartisan organization that works with all the sectors of the medical research and development ecosystem.
efficiency
Our work is focused on bringing efficiency to the medical research R&D process by identifying and eliminating the roadblocks
that slow medical research down.
innovation
• Loose affinity network of 60 nonprofit disease research foundations
• Created to tackle the challenges that cut across diseases through innovative partnerships
• Connected through TRAIN Central Station, an open-source web platform: www.train.fastercures.org
who’s logged on?
Nonprofit 51%
Biotech/Pharma 11%
Academia 11%
Government 5%
Investor 3%
Other 19%
speakers
Stephen H. Friend
President,
Co-Founder,
and Director,
Sage Bionetworks
John Wilbanks
Chief Commons Officer,
Sage Bionetworks &
Senior Fellow,
FasterCures
Gillian Parrish
Associate Director,
Communications
and Outreach,
FasterCures
MODERATOR
NextGen Research
Enabling distributed teams of experts and the engaged public to solve complex biomedical problems
Stephen H Friend MD PhD President
Sage Bionetworks Seattle WA (Non-Profit)
scale of complexity
Reality: Overlapping Pathways
(Eric Schadt)
alchemist
embracing the complexity
Biological System
Data
Analysis
Beyond Iterative Approaches Generating Analyzing and Supporting New Models
“DATA DRIVEN” DATA ANALYSIS
Potential benefit of uncoupling the automatic linkage between the data generators, analyzers, and validators
Now possible to generate massive amount of human “omic’s” data
Network Modeling Approaches for Diseases are emerging
IT Infrastructure and Cloud compute capacity allows a generative open approach to solving problems
Nascent Movement for patients to Control Sensitive information allowing sharing
Open Social Media allows citizens and experts to use gaming to solve problems
1. Now possible to generate massive amount of human “omic’s” data
2. Network Modeling Approaches for Diseases are emerging
3. IT Infrastructure and Cloud compute capacity allow a generative open approach to biomedical problem solving
4. Nascent Movement for patients to Control Sensitive information allowing sharing
5. Open Social Media allows citizens and experts to use gaming to solve problems
A HUGE OPPORTUNITY -- A HUGE RESPONSIBILITY
We focus on a world where biomedical research is about to fundamentally change. We think it will be often conducted in an open, collaborative way where teams of teams far beyond the current guilds of experts will contribute to making better, faster, relevant discoveries
22
TECHNOLOGY PLATFORM
two approaches to building common scientific knowledge
Text summary of the completed project Assembled after the fact
Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
github
Goal is for Synapse to function as a github
for Biomedical Data
• Data and code versioned • Analysis history captured in real time • Work anywhere, and share the results with anyone • Social/Interactive Science
• Every code change versioned • Every issue tracked • Every project the starting point for new work • Social/Interactive Coding
“Synapse is a private or public workspace that allows you to aggregate, describe, and share your research. It is a collection of living research projects enabling you to contribute to large-scale collaborative solutions to difficult scientific problems in real time with direct credit for work done
Synapse contributions to analyzing Cancers
TCGA Pan-Cancer project •Analysis of: 12 Tumor types, 6 molecular profiling platforms •Performed by: 150 researchers from 30 institutions.
Omberg, et al. Enabling collaborative and transparent analysis of 12 tumor types in the cancer genome atlas. S, Nature Genetics (published Oct 2013)
Crowd-sourcing in Computational Biology
Benefits of crowd-sourcing
1. Performance Evaluation Unbiased, consistent, and rigorous method assessment
Discover the Best Methods
Determine the solvability of a scientific question
2. Sampling of the space of methods
Understand the diversity of methodologies presently being used to solve a problem
Benefits of crowd-sourcing, cont’d
3. Acceleration of Research
The community of participants can do in 4 months what would take 10 years to any group
4. Community Building
Make high quality, well-annotated data accessible.
Foster community collaborations on fundamental research questions.
Determine robust solutions through community consensus: “The Wisdom of the Crowds.”
DIALOGUE FOR REVERSE ENGINEERING ASSESSMENT AND METHODS
A crowdsourcing effort that poses questions (Challenges) about systems biology modeling and data analysis: Transcriptional networks, Signaling networks, Predictions to response to perturbations , Translational research (DREAM 7)
DREAM: What is it?
STRUCTURE OF A CHALLENGE
More than 400 team submissions, 1000 cumulative conference attendees, 40 papers written using DREAM Challenges, two edited books and a Special Collection in PLoS One
In DREAM7 we focused on translational research with
DREAM-Phil Bowen Prediction Prize4Life (12 submissions)
Sage-DREAM Breast Cancer Prognosis Challenge (47 teams)
NCI-DREAM Drug Sensitivity Prediction Challenge (55 teams)
Network Topology and Parameter Inference Challenge (12 teams)
Working to create Challenge-assisted peer review
Science Translational Medicine, Nature Biotechnology
Grants Reviews?
The DREAM PROJECT
Building communities of data experts
The 2012 Sage Bionetworks/DREAM7
Breast Cancer Prognosis Challenge
The Sage Bionetworks/DREAM Breast Cancer
Prognosis Challenge
Goal: use crowdsourcing to forge a computational model that accurately predicts breast cancer survival • Training data set: genomic and clinical data from 2000 women
diagnosed with breast cancer (Metabric data set)
• Data access and analysis tools: Synapse
• Compute resources: each participant provided with a standardized virtual machine donated by Google
• Model scoring: models submitted to Synapse for scoring on a real-time leaderboard
Unique Attributes Open source and code-sharing:
The computational infrastructure enables participants to use code submitted by others in their own model building
Winning code must be reproducible
Brand new dataset for final validation of winning model:
Derived from approx. 200 breast cancer samples
Data generation funded by Avon
Winning model: the one that, having been trained using Metabric data, is most
accurate for survival prediction when applied to a brand new dataset
Challenge assisted peer-review
Overall winner can submit a pre-accepted article about his/her winning model to
Science Translational Medicine
Sharing Code Leads To Better Models Faster September 3: Challenge participant Sean Cory tops the leaderboard by combining his clinical insights with the code from the “Attractors Metagenes” team.
39
Sage-DREAM Challenges
Feb 2013
DREAM8 Challenges: Summer 2013
HPN DREAM Challenge: • 279 participants • 178 final submissions • Nature Methods
publication partner
Toxicogenetics DREAM Challenge • 232 participants • 182 final submissions • Nature Biotechnology
publication partner
Whole Cell DREAM Challenge • 106 participants • 9 final submissions • Plos Comp Bio publication
partner
0
50
100
150
200
250
300
DREAM2 DREAM3 DREAM4 DREAM5 DREAM6 DREAM7 DREAM8
NumberofSubmittingTeams DREAMParticipation
DREAM: Team Participation
How Sage/DREAM Nurtures Challenge Communities
• Challenge webinars for live interaction between participants and organizers
• Community forums where participants
can learn from each other • Leaderboard to motivate continuous
participation
• Incentives to code-share: evolving models never before possible (machine learning + clinical insights
• Annual DREAM Conference to celebrate and discuss Challenge outcomes
How DREAM Challenge Recognition Can Help
Andre Falcao: Professor Andre Falcao was a participant in the recently completed DREAM8 NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge. He brought up valid criticisms regarding the scoring metrics that were being used for a portion of the Challenge. Andrew now has taken a leadership role in the current DREAM8.5 planning of the The Rheumatoid Arthritis Responder Challenge, showing how DREAMers can transition from participants to organizers. Alex Williams: Alex is a research technician at Brandeis University and a winner of the DREAM8 Whole Cell Parameter Estimation Challenge. Professor Markus Covert from Stanford, who co-sponsored this Challenge, was so impressed with Alex’s' solutions to the Challenge that he has written Alex a recommendation for graduate school in the fall of 2014. Wei-yi Cheng: Wei-yi was a graduate research assistant when he helped team Attractor Metagenes win the DREAM7 Breast Cancer Prognosis Challenge (BCC). Since winning the BCC, Wei-Yi has since been recruited to join Eric Schadt at the Mount Sinai School of Medicine (MSSM) Institute for Genomics and Multiscale Biology as a research scientist.
DREAM8.5 Challenges: Now Open!
• Predict cancer-associated mutations from whole-genomic sequencing data
• Opened Nov 8, 2013 • 239 registered participants
• Predict which patients will not respond to anti-TNF therapy
• Opened Feb 10, 2014 • 332 registered participants
• Predict early AD-related cognitive decline, and the mismatch between high amyloid levels and cognitive decline
• Opening soon! • 191 registered participants
5 Synthetic Tumour/Normal Pairs
• One released each month • Of increasing complexity • No ICGC data-access needed • Incentives for top-performing
teams may include free cloud-computing credits
• Data available immediately
10 Real Tumour/Normal Pairs
• Released November 2013 • 5 Prostate Cancers • 5 Pancreatic Cancers • ICGC data-access needed • Several thousand candidates
will be validated using independent techniques
in silico Data Real Human Data Data for Somatic Mutation Calling Challenge
The Alzheimer's Disease Big Data DREAM Challenge #1
The Global CEO Initiative on Alzheimer’s Disease has joined with Sage/DREAM to initiate an AD Challenge to identify better predictors of AD risk in pre-clinical populations
A multi-phase effort to identify early accurate predictive bio-markers that can inform the scientific, industrial and regulatory communities about the disease
The Questions Posed for AD Challenge #1: What model best predicts change over time in cognitive scores using all available test and adjacent data? What model best predicts discordance between biomarkers suggestive of amyloid perturbations and lack of cognitive impairment?
DREAM9 Challenges Coming Soon opening in May-June 2014
Broad Gene Essentiality Challenge
Data Set: 500 cell lines with molecular characterization data (from CCLE) and gene essentiality data (from Achilles RNAi screens).
Challenge structure: Participants train gene essentiality predictive models using training data. Use molecular information from test data to predict gene essentiality scores, which are compared against held out dataset.
DREAM AML Treatment Outcomes Challenge
Data Set: RPPA data on 231 antibodies and correlated patient demographic and outcomes data
Potential Challenge objectives: Predict AML patient overall survival and remission duration Predict patients who respond to therapy (CR), those that then will relapse, and those that are primary resistant to therapy.
DREAM9.5 Challenges opening in end-2014
Three potential Challenges
Histopathology
RA#2 Challenge
Disease Activity
Brain Imaging
DREAM Challenge Zoran Popovic DARPA MICCAI Physics Geeks
EMPOWER THE PUBLIC NURTURE AS FULL PARTNERS
BRIDGE PORTABLE LEGAL CONSENT
ENABLE TEAMS OF TEAMS TO EVOLVE IDEAS IN REAL TIME
SYNAPSE with PROVENENCE CONSORTIA OPEN DREAM CHALLENGES
NextGen Biomedical Research
2013
Rui Chang et al. PLoS Computational Biology
Normal State
Disease State
To avoid mismatch between State of the Institutions and States of the Technology
To avoid siloed problem solving by those gaming the system for tenure
To bring in citizens with their insights, data and funding
We need to fundamentally change the current guilds of experts approaches
Navigating between states of wellness
Sage Bionetworks • Adam Margolin • Chris Bare • Kristen Dang • Bruce Hoff • Jay Hodgson • Justin Guinney • Lara Mangravite • Solly Sieberts • Abhi Pratap • Christine Suver • Mette Peters • Arno Klein • Mike Kellen • Thea Norman • Stephen Friend
DREAM • Erhan Bilal, IBM • Federica Eduati, EBI • Gustavo Stolovitzky, IBM • Jim Costello, HMS • Julio Saez Rodriguez, EBI • Kely Norel, IBM • Laura Heiser, OHSU • Michael Menden, EBI • Pablo Meyer Rojas, IBM • Thomas Cokelaer, EBI • Kahn Rhrissorrakrai, IBM • Daniel Marbach, MIT
The Sage-Bionetworks and DREAM Team
Q&A
Stephen H. Friend
President,
Co-Founder,
and Director,
Sage Bionetworks
John Wilbanks
Chief Commons Officer,
Sage Bionetworks &
Senior Fellow,
FasterCures
Gillian Parrish
Associate Director,
Communications
and Outreach,
FasterCures
MODERATOR
view an archive of this webinar www.train.fastercures.org
keep up with our blog
fastercures.tumblr.com
subscribe The latest developments in medical research delivered to you every Tuesday and Thursday to keep you current on relevant news.
connect with
@fastercures www.fastercures.org