friend eortc 2012-11-08

49
Integrating Cancer Networks and the Value of Compute Spaces Stephen H Friend November 8, 2012 EORTC/NCI Dublin

Upload: sage-base

Post on 25-Dec-2014

251 views

Category:

Health & Medicine


1 download

DESCRIPTION

Stephen Friend Nov 8, 2012. 24th EORTC-NCI-AACR Symposium on Molecular Targets and Cancer Therapeutics, Dublin, Ireland

TRANSCRIPT

Page 1: Friend EORTC 2012-11-08

Integrating Cancer Networks and the Value of Compute Spaces

Stephen H Friend November 8, 2012

EORTC/NCI Dublin

Page 2: Friend EORTC 2012-11-08

KRAS NRAS

BRAF

MEK1/2

EGFR

ERBB2

BCR/ABL

EGFRi

Proliferation, Survival

• EGFR Pathway commonly mutated/activated in Cancer • 30% of all epithelial cancers

• Blocking Abs approved for treatment of metastatic

colon cancer

• Subsequently found that RASMUT tumors don’t respond – “Negative Predictive Biomarker”

• However still EGFR+ / RASWT patients who don’t respond? – need “Positive Predictive Biomarker”

• And in Lung Cancer not clear that RASMUT status is useful biomarker

Predicting treatment response to known oncogenes is complex and requires detailed understanding of how different genetic backgrounds function

Oncogenes only make good targets in particular molecular

contexts : EGFR story

Page 3: Friend EORTC 2012-11-08

Reality: Overlapping Pathways

Page 4: Friend EORTC 2012-11-08

Preliminary Probabalistic Models- Rosetta

Gene symbol Gene name Variance of OFPM

explained by gene

expression*

Mouse

model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics

Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics

Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of

Medicine and Dentistry at New

Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics

Me1 Malic enzyme 1 52% ko Naturally occurring KO

Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13]

Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11]

C3ar1 Complement component

3a receptor 1

46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth

factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct

identification of genes that are

causal for disease

Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

Page 5: Friend EORTC 2012-11-08

"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)

"Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)

"Genetics of gene expression and its effect on disease." Nature. (2008)

"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)

….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc

"Identification of pathways for atherosclerosis." Circ Res. (2007)

"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)

…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome

"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)

“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)

"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)

"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)

"Integrating large-scale functional genomic data ..." Nat Genet. (2008)

…… Plus 3 additional papers in PLoS Genet., BMC Genet.

d

Metabolic

Disease

CVD

Bone

Methods

Extensive Publications now Substantiating Scientific Approach

Probabilistic Causal Bionetwork Models

>80 Publications from Rosetta Genetics/ Sage Bionetworks

Page 6: Friend EORTC 2012-11-08
Page 7: Friend EORTC 2012-11-08
Page 8: Friend EORTC 2012-11-08

Biological

System

Data

Analysis

Iterative Networked Approaches

To Generating Analyzing and Supporting New Models

Uncouple the automatic linkage between the

data generators, analyzers, and validators

Page 9: Friend EORTC 2012-11-08

An Alternative

Commons are resources that are owned in common or shared among

communities.

-David Bollier

Biomedicine

Information

Commons

Page 10: Friend EORTC 2012-11-08

Sage Bionetworks

A non-profit organization with a vision to enable networked team

approaches to building better models of disease

BIOMEDICINE INFORMATION COMMONS INCUBATOR

Better Models of

Disease:

INFORMATION

COMMONS

Technology Platform

Challenges

Imp

actf

ul M

od

els

Go

vernan

ce

Page 11: Friend EORTC 2012-11-08

Sage Bionetworks Collaborators

Pharma Partners Merck, Pfizer, Takeda, Astra Zeneca,

Amgen,Roche, Johnson &Johnson

11

Foundations

Kauffman CHDI, Gates Foundation

Government

NIH, LSDF, NCI

Academic

Levy (Framingham)

Rosengren (Lund)

Krauss (CHORI)

Federation

Ideker, Califano, Nolan, Schadt

Page 12: Friend EORTC 2012-11-08

IT/Data Generators

Pharma

Academic Consortia

Joint Patient/Scient

ist Communities

Biotech

Patient Foundations

Individual Patients

Better Models of Disease:

INFORMATION COMMONS

Technology Platform

Challenges

Imp

actf

ul M

od

els

Go

vern

ance

Constituencies

Page 13: Friend EORTC 2012-11-08

Background: Information Commons for Biological Functions

Page 14: Friend EORTC 2012-11-08

SYNAPSE

CURATED

DATA

TOOLS/

METHODS

ANALYZES/

MODELS

RAW

DATA

BioMedicine Information Commons

Data

Generators

Data

Analysts

Experimentalists

Clinicians

Patients/

Citizens

Networked Approaches

Page 15: Friend EORTC 2012-11-08

FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR

• Provide a “compute space” for hosting and sharing models – (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)

• Co-generate models of drivers for Cell Line/Clinical Sensitivity

• Host Challenges and other approaches that will maximize most people providing and sharing their insights as quickly as possible – https://synapse.sagebase.org/ - BCCOverview:0

• Engage citizens as partners in gathering information and insights and funds

Page 16: Friend EORTC 2012-11-08

Two approaches to building common scientific

and technical knowledge

Text summary of the completed project

Assembled after the fact

Every code change versioned

Every issue tracked

Every project the starting point for new work

All evolving and accessible in real time

Social Coding

Page 17: Friend EORTC 2012-11-08

“Synapse is a compute platform

for transparent, reproducible, and

modular collaborative research.”

Page 18: Friend EORTC 2012-11-08

Synapse is GitHub for Biomedical Data

• Data and code versioned

• Analysis history captured in real time

• Work anywhere, and share the results with anyone

• Social/Interactive Science

• Every code change versioned

• Every issue tracked

• Every project the starting point for new work

• Social/Interactive Coding

Page 19: Friend EORTC 2012-11-08

Currently at 16K+ datasets and ~1M models

Page 20: Friend EORTC 2012-11-08

Demo Interaction

Download Data from Web Programmatic Access to Data

Page 21: Friend EORTC 2012-11-08

Demo Interaction

Download Data from Web Programmatic Access to Data

Page 22: Friend EORTC 2012-11-08

Data Repository: with versions

Points to specific

version of repository

Page 23: Friend EORTC 2012-11-08

Pancancer collaborative subtype discovery

Page 24: Friend EORTC 2012-11-08

Download analysis and meta-analysis

Download another Cluster

Result

Download Evaluation and view more

stats

• Perform Model averaging

• Compare/contrast models

• Find consensus clusters

Page 25: Friend EORTC 2012-11-08

130$drugs$

Predic.on$

Accuracy$(R2)$

Performance*assessment*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*

Copy*number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Synapse infrastructure for sharing, searching, and analyzing TCGA data

• Comparison of many modeling approaches applied to the same data.

• Models transparently shared and reusable through Synapse.

• Displayed is comparison of 6 modeling approaches to predict sensitivity to 130 drugs.

• Extending pipeline to evaluate prediction of TCGA phenotypes.

• Hosting of collaborative competitions to compare models from many groups.

Page 26: Friend EORTC 2012-11-08

Performance*assessment*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*

Copy*number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Synapse transparent, reproducible, versioned machine

learning infrastructure for method comparison

1) Automated, standardized workflows for curation, QC and hosting of large-scale datasets (Brig Mecham).

custom model 1 custom model 2 custom model N

4) Statistical performance assessment across models.

custom model 1 custom model 2 custom model N

5) Output of candidate biomarkers and feature evaluation (e.g. GSEA, pathway analysis)

2) Programmatic APIs to load standaridzed objects, e.g. R ExpressionSets (Matt Furia): Load cell line feature and response data: > ccleFeatureData <- getEntity(ccleFeatureDataId) > ccleResponseData <- getEntity(ccleResponseDataId) Load TCGA feature and phenotype data (in same format as cell line data): > tcgaFeatureData <- getEntity(tcgaFeatureDataId) > tcgaResponseData <- getEntity(tcgaResponseDataId)

3) Pluggable API to implement predictive modeling algorithms.

User implements customTrain() and customPredict() functions.

Support for all commonly used machine learning methods (for automated benchmarking against new methods)

Page 27: Friend EORTC 2012-11-08

Objective assessment of factors influencing model

performance (>1 million predictions evaluated)

Sanger CCLE Prediction accuracy

improved by…

Not discretizing data

Including expression data

Elastic net regression

130 compounds 24 compounds

Cro

ss v

alid

atio

n p

red

icti

on

acc

ura

cy (

R2)

In Sock Jang

Page 28: Friend EORTC 2012-11-08

Assessment of pathway enrichment of inferred

predictive feature sets

KEGG REACTOME BIOCARTA

San

ger

CC

LE Pat

hw

ays

Compounds

Page 29: Friend EORTC 2012-11-08

Data Analysis with Synapse

Run Any Tool

On Any Platform

Record in Synapse

Share with Anyone

Page 30: Friend EORTC 2012-11-08

Why Stratifying Patients for Therapy Matters

30

Chemotherapy

Chemotherapy

+

Cetuximab

43% 59%

60% mCRC

patients are

RASwt

40% mCRC

patients are

RASmut

Metastatic

Colorectal Cancer (mCRC)

But not all CRC patients that are RASwt respond to Cetuximab In other cancers for which it is efficacious RAS status appears not to predict response (e.g. lung)

40% 36%

responder

non-responder

KRAS

BRAF

MEK1/2

EGFR

Proliferation, Survival

Page 31: Friend EORTC 2012-11-08

RAS Model using primary tumor data to predict KRAS mutation status

31

290 CRC samples:

• KRAS12 or KRAS13 (n=115) vs WT (n=175)

• Penalized regression model using ElasticNet and gene expression data

Robust External Validation In CRC data sets

RAS signatures derived from CRC cohort can classify mutation status in CRC

False positive rate

Tru

e p

ositiv

e r

ate

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

TCGA CRC

Khambata−Ford

Gaedcke

Model specific to

CRC: does not

generalized to other

KRAS dependent

cancers

Page 32: Friend EORTC 2012-11-08

32

kra

s.p

.G12

D

kra

s.p

.G1

2V

kra

s.p

.G13

D

kra

s.p

.A14

6T

kra

s.p

.G12

C

kra

s.p

.G1

2S

kra

s.p

.G1

2A

kra

s.p

.K1

17

N

kra

s.p

.Q61L

kra

s.p

.A14

6V

kra

s.p

.E9

8X

kra

s.p

.G12

R

kra

s.p

.G13

C

kra

s.p

.Q2

2K

kra

s.p

.R6

8S

bra

f.p.V

60

0E

bra

f.p.E

22

8V

bra

f.p.F

247L

bra

f.p.K

205Q

nra

s.p

.Q6

1K

nra

s.p

.G12

C

nra

s.p

.G12

D

nra

s.p

.G13

R

nra

s.p

.Q61L

nra

s.p

.E13

2K

nra

s.p

.G1

2A

nra

s.p

.Q61

H

nra

s.p

.Q61

R

nra

s.p

.R1

64

C

WT

0.0

0.2

0.4

0.6

0.8

1.0

RIS

kras

braf

nras

wt

Exploring the RASness Model in TCGA Colorectal Carcinoma

Putative novel activating KRAS mutations

Page 33: Friend EORTC 2012-11-08

Can we predict response to RAS Pathway Drugs in CRC Cell lines?

33

RASness Model Translates to predict response to RAS pathway drugs in CRC cell lines

P value

Note: KRAS and/or BRAF mutation status NOT predictive of response to MEK inhibitor

KRAS NRAS

BRAF

MEK1/2

EGFR

ERBB2

BCR/ABL

Proliferation, Survival

PD-0325901

AZD6244

Correlate RASness Score with IC50 for drugs across 21 CRC cell lines from CCLE1 panel

1. Barretina et al. 2012 Nature. 483:603: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Page 34: Friend EORTC 2012-11-08

34

RASness Model Predicts response to Cetuximab in patient and xenograft data

Non−response Response

0.1

0.3

0.5

tumor, n=19

RIS

●●

●●

p=0.023 (p>.5)

Non−response Response

0.2

0.4

0.6

0.8

xeno early, n=26

RIS

●●

●●

●●●

●●

p=0.017 (p=0.03)

Non−response Response

0.1

0.3

0.5

0.7

xeno late, n=28

RIS

●●●

●●

●●

●●

p=0.0034 (p>.5)

Non−response Response

0.0

0.4

0.8

tumor + xeno, n=73

RIS

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

p=1.4e−05 (p=0.13)

kras

braf

kras+braf

wt

RAS model predicts response to Cetuximab better than mutation status

54 xenograft models

115 expression arrays, xenograft and primary

tumor

Kras, braf, pik3ca, apc profiled

Response to cetuximab, 5-FU, I-OHP, CPT-11

measured

Page 35: Friend EORTC 2012-11-08

Predictive models of cancer phenotypes

mRNA

copy number

somatic mutations

epigenetics

proteomics

Molecular

characterization

Cancer

phenotypes

Drug sensitivity

screens

Clinical

prognosis

Panel of tumor samples

Predictive model

15

Page 36: Friend EORTC 2012-11-08

Developing predictive models of genotype-specific

sensitivity to compound treatment P

red

icti

ve F

eatu

res

(bio

mar

kers

)

Genetic Feature Matrix Expression, copy number, somatic mutations, etc.

Maximize:

logPr |C,G ~ C G2

2

1 1

2

2

Sensitive Refractory

(e.g. EC50)

Cancer samples with varying degrees of response to therapy

36

Page 37: Friend EORTC 2012-11-08

AHR expression predicts sensitivity to MEK inhibitors in NRAS mutant cell lines

Functionally validated by AHR knockdown

Legend AHR shRNA Control shRNA

Novel predictions are functionally validated

37

Prediction Validation

BCL$xL&Expression&&

Doxorubicin*Triptolide*Eme3ne*ActD*Flavopiridol*Anicomycin*Puromycin*

! "#$%&'#( ) * +', - &$#"#( &'* . /%0* 0&1&"23#/#4* . 4#5&6 7/#4* 86 ) 94) * : 2"&6 7/#4*

;<"*

/,5$,5)*

=><"*

?!@

*

BCL-xL expression predicts sensitivity to several chemotherapeutics

Functionally validated by :

BCL-xL knockdown BCL-xL inhibitor drug synergy

Mouse models Clinical trials

Wei G.*, Margolin A.A.*, et al, Cancer Cell

Page 38: Friend EORTC 2012-11-08

REDEFINING HOW WE WORK TOGETHER: Sage/DREAM Breast Cancer Prognosis Challenge

Page 39: Friend EORTC 2012-11-08

What is the problem?

Our current models of disease biology are primitive and limit

doctor’s understanding and ability to treat patients

Current incentives reward those who

silo information and work in closed

systems

Page 40: Friend EORTC 2012-11-08

The Solution: Competitions to crowd-source research

in biology and other fields

Why competitions?

• Objective assessments

• Acceleration of progress

• Transparency

• Reproducibility

• Extensible, reusable models

Competitions in biomedical research

• CASP (protein structure)

• Fold it / EteRNA (protein / RNA structure)

• CAGI (genome annotation)

• Assemblethon / alignathon (genome assembly / alignment)

• SBV Improver (industrial methodology benchmarking)

• DREAM (co-organizer of Sage/DREAM competition)

Generic competition platforms

• Kaggle, Innocentive, MLComp

Page 41: Friend EORTC 2012-11-08

METABRIC

•Array-CGH

•Expression arrays

•Sequencing TP53 PIK3CA

•Amplified DNA and cDNA banks

•miRNA profiling

Anglo-Canadian collaboration

Gene sequencing (ICGC)

Page 42: Friend EORTC 2012-11-08

Sage/DREAM Challenge: Details and Timing

Phase 1: July thru end-Sep 2012

Training data: 2,000 breast cancer samples from METABRIC cohort

• Gene expression

• Copy number

• Clinical covariates

• 10 year survival

Supporting data: Other Sage-curated breast cancer datasets

• >1,000 samples from GEO

• ~800 samples from TCGA

• ~500 additional samples from Norway group

• Curated and available on Synapse, Sage’s compute platform

Data released in phases on Synapse from now through end-September

Will evaluate accuracy of models built on METABRIC data to predict survival in:

• Held out samples from METABRIC

• Other datasets

Phase 2: Oct 15 thru Nov 12, 2012

Evaluation of models in novel dataset.

Validation data: ~500 fresh frozen tumors from Norway group with:

• Clinical covariates

• 10 year survival

Page 43: Friend EORTC 2012-11-08

Performance*assessment*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*

Copy*number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Synapse transparent, reproducible, versioned machine

learning infrastructure for method comparison

Custom models implement train() and predict() API.

Implementation of simple clinical-only survival model used as baseline predictor.

Page 44: Friend EORTC 2012-11-08

Trey%Ideker)

Janusz%Dutkowski)

Eric%Schadt)Gaurav%Pandey)

Gustavo%Stolovi= ky)Erhan%

Bilal)

Andrea%Califano)

Yishai%Shimoni)

Mukesh%Bansal) Mariano%

Alvarez)

Garry%Nolan)

In%Sock%Jang) Ben%Sauerwine)

Stephen%Friend)

Justin%Guinney)

Marc%Vidal)

Adam%Margolin)

Ben%Logsdon)

Federation modeling competition

Models submitted and evaluated in real-time

leaderboard

>200 models tested within 3 months

Page 45: Friend EORTC 2012-11-08

Sage-DREAM Breast Cancer Prognosis Challenge one month of building better disease models together

154 participants; 27 countries

268 participants; 32 countries

290 models posted to Leaderboard

breast cancer data

Challenge Launch: July 17

August 17 Status

Page 46: Friend EORTC 2012-11-08

Summary of Breast Cancer Challenge #1 https://synapse.sagebase.org/ - BCCOverview:0

Transparency, reproducibility

Validation in novel dataset

Publication in Science Translational Medicine

Donation of Google-scale compute space.

For the goal of promoting democratization of medicine… Registration starting NOW…

sign up at: synapse.sagebase.org

Performance*assessment*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*Copy*

number* Muta6on* Phenotype*

Expression*

Copy*number*

Muta6on*

Phenotype*Expression*

Copy*number*

Muta6on*

Phenotype*

Predic6ve*model*genera6on*

Page 47: Friend EORTC 2012-11-08

FOUR PILOTS IN THE SAGE BIONETWORKS COMMONS INCUBATOR

• Provide a “compute space” for hosting and sharing models – (to complement data storage and tools provided by Sanger Broad…)- SYNAPSE)

• Co-generate models of drivers for Cell Line/Clinical Sensitivity

• Host Challenges and other approaches that will maximize most people providing and sharing their insights as quickly as possible – https://synapse.sagebase.org/ - BCCOverview:0

• Engage citizens as partners in gathering information and insights and funds

Page 48: Friend EORTC 2012-11-08

SYNAPSE

CURATED

DATA

TOOLS/

METHODS

ANALYZES/

MODELS

RAW

DATA

BioMedicine Information Commons

Data

Generators

Data

Analysts

Experimentalists

Clinicians

Patients/

Citizens

Networked Approaches

Page 49: Friend EORTC 2012-11-08

Upon this gifted age, in its dark hour,

Rains from the sky a meteoric shower

Of Facts…they lie unquestioned,uncombined.

Wisdom enough to leech us of our ill

Is daily spun; but there exists no loom

To weave it into fabric.

- Edna St. Vincent Millay