identification and verification of translational …...• predict potential drug target cancer...

Identification and Verification of Translational

Biomarkers for Patient Stratification and

Precision Medicine

May 18, 2016

Bin Li, PhD, Associate Director

Translational Medicine

Takeda Pharmaceuticals International Co.

Biomarkers for Patients Stratification and Precision Medicine

-- Cancer and Beyond

AAPS National Biotechnology Conference, Boston on May 16-18, 2016

Disclaimer

Any views or opinions presented here

are solely those of the author and do

not necessarily represent those of the

company.

• Design and implement a predictive modeling framework

• Method evaluation

– Using SOC drugs to evaluate the predictive modeling approach

• Build Erlotinib and Sorafenib sensitivity models using cell line data

• Tested the models using real patients’ output (PFS from BATTLE trials)

• Predict potential drug target cancer indications

– Method comparison on NCI-DREAM challenge

• A fair comparison on predictive modeling methods, used by 47 teams on

22 drugs

Outline

• Goal:

To discover patient selection biomarkers using preclinical data, e.g.

using cell line viability screens

We want to be testing hypothesis in Ph II and using Ph III clinical

trials to select the right patients for treatment

• What we want to gain from building the predictive models

– Build a predictive model on cell line data, which can be used for patients

– Model derived signature genes reflects a drug’s MOA

– Each drug’s predictive model should be drug specific

– Using the predictive model for cancer indication selection

What we want to achieve on the predictive models

• Learned from FDA organized MAQC_II observations – Nat. Biotech. 2010, 28, 831

– 36 academic and industry teams participated

– Most of the state-of-art computational approaches were applied

– We decided to build our modeling framework based partial least squares regression (PLSR) method

• Technique considerations on choosing PLSR method:– PLSR is similar to PCA, while it could be more effective than PCA since

it targets best X-Y relationship (PCA captures key variance in X only)

– PLSR is well known as a dimension reduction tool (ideal for genomic data)

– PLSR was designed to address multicollinearity issue

• Learned from CCLE and Sanger observations– Nature, 2012, 483, 603; Nature, 2012, 483, 570

– Both teams adopted Elastic Net (EN) method for biomarker discovery, while EN approach is very similar to PLSR (EN seeks max coefficient vsPLSR seeks max covariance)

Design and development of a PLSR based

predictive modeling framework

A specially designed modeling framework to

capture consensus information from the dataset

Data reduction

Feature selection

Input data:

GEP & IC50

Model training using a

specially designed

splitting strategy

Obtain a core model

Get a pathway based

core model

Test model

Training

a model

on

cell line

data

Testing

the model

on

patients’

data

(A)

(C)

0.5 0.6 0.7 0.8 0.9 1.0

0.2

0.4

0.6

0.8

Sub-testing AUC vs. cor

AUC

Cor

rela

tion

0.5 0.6 0.7 0.8 0.9 1.0

0.2

0.4

0.6

0.8

Sep-testing AUC vs. cor

AUC

Cor

rela

tion

Random.validation Balance.validation

(B)

(D)

Panel cell lines

42% random.training 28% random.validation 30% balance.validation

(10-fold cross validation)

Bin Li, Hyunjin Shin, et al. PLoS ONE 2015, 10(6): e0130700.

Pathway-based signature genes’ network

reconstruction









22 drugs

Outline

BladderBreast

CNS

Colon

Colon_GI

Endocrine

Female_GU_Cervi

x

Female_GU_otherHaematopoietic_a

nd_lymphoid

Head_and_Neck

Kidney

Liver

Lung

Pancreas

Prostate

Skin

Soft_Tissue

not_classified

Ricerca Oncopanel cell based assay:

Ricerca Screen:

- 240 cell lines

- varying histological origin

-10 point dose titration => IC50s

- Baseline Microarray data

Re-predicting Ricerca log2(IC50)

Accuracy estimation:

Upper boundary: 91%

Lower boundary: 77%

Trained the Erlotinib PLSR model on cell line data:

Built a good model with 77-91% accuracy

54675 probesets

3787 probesets

485 genes

(full model)

187 core genes

(core model)

51 genes

(final pathway model)

Data reduction

Feature selection

Input data:

GEP & IC50

Model training using a

specially designed

splitting strategy

Obtain a core model

Get a pathway based

core model

Test model

Training

a model

on

cell line

data

Testing

the model

on

patients’

data

Overall work flow on model building and testing:

on Erlotinib case study

Signature genes reflect known MOA of Erlotinib

and Sorafenib

Erlotinib Sorafenib(A) (B)

Test the Erlotinib model using BATTLE clinic trial:

background

• Trained a PLSR model using Erlotinib Ricerca screen data (IC50s)

– On Affymetrix U133plus; 240 cell lines on multiple cancer types

• Using the model to predict patients’ survival (PFS) in BATTLE clinic trial

– On Affymetrix Hu Gene 1.0 ST; 25 Erlotinib patients on NSCLC

Cell line derived drug sensitivity models were able

to predict patients responses and are drug specific

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Monthes from Start of Therapy

Pro

port

ion o

f C

ases

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0


Pro

port

ion o

f C

ases

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0


Pro

port

ion o

f C

ases

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0


Pro

port

ion o

f C

ases

P = 0.09

HR = 0.43

P = 0.006

HR = 0.32

P = 0.54

HR = 1.32

P = 0.32

HR = 1.87

E_model pred E_PFS S_model pred S_PFS

E_model pred S_PFS S_model pred E_PFS

(A)

(D)

(B)

(C)

E: Erlotinib; S: Sorafenib; red: marker pos; green: marker neg

Using the predictive models for drug target cancer

indication selection: build a translational medicine data

warehouse

• To provide a large, well organized, and integrated dataset

– Compound screen data, public data, and clinic outputs

• Standardize vocabulary and Normalize experimental results

– Use standard terminology for the same concepts, e.g. gender, sex

– Global normalization on gene expression data

• Current data in Vahalla

– Takeda proprietary data: VELCADE clinical trials; cancer cell line screen data on multiple Takeda Oncology compounds

– Public data: > 2000 GEO studies, with ~ 350 GEO studies manually curated; TCGA data; CCLE and Sanger cancer cell line panel data

• A R-interface for computational scientists to build/test predictive models

Predicting Erlotinib or Sorafenib sensitive percentage of

samples to select drug target cancer indications

Cancer

Type

# of

samples

Pred. Erlotinib

Sen. percentage

Pred. Sorafenib

Sen. percentage

FDA approved Erlotinib or Sorafenib indications:

Lung Cancer 329 15.81 0.61

Liver Cancer 85 0.00 31.76

Kidney Cancer 218 0.46 24.77

Some interesting indications predicted to be sensitive to Erlotinib or Sorafenib:

Head and Neck Cancer 168 94.05 12.50

Bladder Cancer 102 41.18 3.92

Haematopoietic Neoplasms 3590 2.17 32.56

Bone Cancer 88 0.00 29.55

Breast Cancer 1668 5.64 6.47

Colorectal Cancer 948 0.11 0.21

Pancreatic Cancer 75 0.00 1.33

Erlotinib was approved in Lung

cancer

Sorafenib was approved in Live and

Kidney cancers

Erlotinib failed in one Kidney

cancer PIII clinical trial

Predicted additional Erlotinib

sensitive cancer indications

Predicted additional Sorafenib

sensitive cancer indications

Bin Li, Hyunjin Shin, et al. PLoS ONE 2015, 10(6): e0130700.

• Goal:

To discover patient selection biomarkers using preclinical data, e.g.

using cell line viability screens

We want to be testing hypothesis in Ph II and using Ph III clinical

trials to select the right patients for treatment

• What we want to gain from building the predictive models

– Build a predictive model on cell line data, which can be used for patients

– Model derived signature genes reflects a drug’s MOA

– Each drug’s predictive model should be drug specific

– Using the predictive model for cancer indication selection

What we want to achieve on the predictive models

√√√√









22 drugs

Outline

NCI-DREAM consortium testing drug sensitivity

predictive models: 47 teams and 31 drugs

The NCI-DREAM training, testing, and scoring (Nat. Biotech. 2014, 32, 1202-1212):

TMed comp bio team excluded 9 drugs with “flat” IC50 distributions:

Official

Results(Nat. Biotech)

Follow-up

analysis(internal)

TBOS team offically ranked No. 10 among 44 teams

(while we only predicted 22 out of 31 drugs)

TBOS team

teamfin Team_475 Team_690 Team_511 Team_680 Team_425 Team_603

0.2

0.3

0.4

0.5

0.6

0.7

0.8

TMM1_GEP Team_689 Team_691 Team_176 Team_418 Team_474 Team_545

0.2

0.4

0.6

0.8

For Official analysis on 31 drugs: TBOS team ranks No. 10

For the 22 drugs we built predictive models:

TBOS team ranks No. 3 on summarized scores and No. 1 on median scores

• We designed and implemented a predictive modeling framework

• The predictive modeling approach was validated on Erlotinib/Sorafenib

– Cell line data derived models can predict patients’ response

– Model derived signature genes could reflect each drug’s MOA

– Predictive models are drug specific

– The models were also able to predict potential drug target cancer indications

• Method comparison on NCI-DREAM challenge

– Our biomarker method was highly competitive among 47 teams on 22 drugs

Conclusions:

Acknowledgements

• TMed project leadersYuko IshiiBill TrepicchioSteve BlakemoreGeorge MulliganAndy Dorner……

• TMed assay team

Sunita Badola

Scott Verrow

Elena Izmailova

……

• TMed computational team

Gene Shin

Andrew Krueger

Dave Merberg

Lei Shen

……

• Thomas Reuters

Olga Pustovalova

Georgy Gulbekyan

Marina Bessarabova

……

identification and verification of translational …...• predict potential drug target cancer...

Documents