identification and verification of translational …...• predict potential drug target cancer...
TRANSCRIPT
Identification and Verification of Translational
Biomarkers for Patient Stratification and
Precision Medicine
May 18, 2016
Bin Li, PhD, Associate Director
Translational Medicine
Takeda Pharmaceuticals International Co.
Biomarkers for Patients Stratification and Precision Medicine
-- Cancer and Beyond
AAPS National Biotechnology Conference, Boston on May 16-18, 2016
Disclaimer
Any views or opinions presented here
are solely those of the author and do
not necessarily represent those of the
company.
• Design and implement a predictive modeling framework
• Method evaluation
– Using SOC drugs to evaluate the predictive modeling approach
• Build Erlotinib and Sorafenib sensitivity models using cell line data
• Tested the models using real patients’ output (PFS from BATTLE trials)
• Predict potential drug target cancer indications
– Method comparison on NCI-DREAM challenge
• A fair comparison on predictive modeling methods, used by 47 teams on
22 drugs
Outline
• Goal:
To discover patient selection biomarkers using preclinical data, e.g.
using cell line viability screens
We want to be testing hypothesis in Ph II and using Ph III clinical
trials to select the right patients for treatment
• What we want to gain from building the predictive models
– Build a predictive model on cell line data, which can be used for patients
– Model derived signature genes reflects a drug’s MOA
– Each drug’s predictive model should be drug specific
– Using the predictive model for cancer indication selection
What we want to achieve on the predictive models
• Learned from FDA organized MAQC_II observations – Nat. Biotech. 2010, 28, 831
– 36 academic and industry teams participated
– Most of the state-of-art computational approaches were applied
– We decided to build our modeling framework based partial least squares regression (PLSR) method
• Technique considerations on choosing PLSR method:– PLSR is similar to PCA, while it could be more effective than PCA since
it targets best X-Y relationship (PCA captures key variance in X only)
– PLSR is well known as a dimension reduction tool (ideal for genomic data)
– PLSR was designed to address multicollinearity issue
• Learned from CCLE and Sanger observations– Nature, 2012, 483, 603; Nature, 2012, 483, 570
– Both teams adopted Elastic Net (EN) method for biomarker discovery, while EN approach is very similar to PLSR (EN seeks max coefficient vsPLSR seeks max covariance)
Design and development of a PLSR based
predictive modeling framework
A specially designed modeling framework to
capture consensus information from the dataset
Data reduction
Feature selection
Input data:
GEP & IC50
Model training using a
specially designed
splitting strategy
Obtain a core model
Get a pathway based
core model
Test model
Training
a model
on
cell line
data
Testing
the model
on
patients’
data
(A)
(C)
0.5 0.6 0.7 0.8 0.9 1.0
0.2
0.4
0.6
0.8
Sub-testing AUC vs. cor
AUC
Cor
rela
tion
0.5 0.6 0.7 0.8 0.9 1.0
0.2
0.4
0.6
0.8
Sep-testing AUC vs. cor
AUC
Cor
rela
tion
Random.validation Balance.validation
(B)
(D)
Panel cell lines
42% random.training 28% random.validation 30% balance.validation
(10-fold cross validation)
Bin Li, Hyunjin Shin, et al. PLoS ONE 2015, 10(6): e0130700.
Pathway-based signature genes’ network
reconstruction
• Design and implement a predictive modeling framework
• Method evaluation
– Using SOC drugs to evaluate the predictive modeling approach
• Build Erlotinib and Sorafenib sensitivity models using cell line data
• Tested the models using real patients’ output (PFS from BATTLE trials)
• Predict potential drug target cancer indications
– Method comparison on NCI-DREAM challenge
• A fair comparison on predictive modeling methods, used by 47 teams on
22 drugs
Outline
BladderBreast
CNS
Colon
Colon_GI
Endocrine
Female_GU_Cervi
x
Female_GU_otherHaematopoietic_a
nd_lymphoid
Head_and_Neck
Kidney
Liver
Lung
Pancreas
Prostate
Skin
Soft_Tissue
not_classified
Ricerca Oncopanel cell based assay:
Ricerca Screen:
- 240 cell lines
- varying histological origin
-10 point dose titration => IC50s
- Baseline Microarray data
Re-predicting Ricerca log2(IC50)
Accuracy estimation:
Upper boundary: 91%
Lower boundary: 77%
Trained the Erlotinib PLSR model on cell line data:
Built a good model with 77-91% accuracy
54675 probesets
3787 probesets
485 genes
(full model)
187 core genes
(core model)
51 genes
(final pathway model)
Data reduction
Feature selection
Input data:
GEP & IC50
Model training using a
specially designed
splitting strategy
Obtain a core model
Get a pathway based
core model
Test model
Training
a model
on
cell line
data
Testing
the model
on
patients’
data
Overall work flow on model building and testing:
on Erlotinib case study
Signature genes reflect known MOA of Erlotinib
and Sorafenib
Erlotinib Sorafenib(A) (B)
Test the Erlotinib model using BATTLE clinic trial:
background
• Trained a PLSR model using Erlotinib Ricerca screen data (IC50s)
– On Affymetrix U133plus; 240 cell lines on multiple cancer types
• Using the model to predict patients’ survival (PFS) in BATTLE clinic trial
– On Affymetrix Hu Gene 1.0 ST; 25 Erlotinib patients on NSCLC
Cell line derived drug sensitivity models were able
to predict patients responses and are drug specific
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Monthes from Start of Therapy
Pro
port
ion o
f C
ases
0 2 4 6 8 10 12
0.0
0.2
0.4
0.6
0.8
1.0
Monthes from Start of Therapy
Pro
port
ion o
f C
ases
0 2 4 6 8 10 12
0.0
0.2
0.4
0.6
0.8
1.0
Monthes from Start of Therapy
Pro
port
ion o
f C
ases
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Monthes from Start of Therapy
Pro
port
ion o
f C
ases
P = 0.09
HR = 0.43
P = 0.006
HR = 0.32
P = 0.54
HR = 1.32
P = 0.32
HR = 1.87
E_model pred E_PFS S_model pred S_PFS
E_model pred S_PFS S_model pred E_PFS
(A)
(D)
(B)
(C)
E: Erlotinib; S: Sorafenib; red: marker pos; green: marker neg
Using the predictive models for drug target cancer
indication selection: build a translational medicine data
warehouse
• To provide a large, well organized, and integrated dataset
– Compound screen data, public data, and clinic outputs
• Standardize vocabulary and Normalize experimental results
– Use standard terminology for the same concepts, e.g. gender, sex
– Global normalization on gene expression data
• Current data in Vahalla
– Takeda proprietary data: VELCADE clinical trials; cancer cell line screen data on multiple Takeda Oncology compounds
– Public data: > 2000 GEO studies, with ~ 350 GEO studies manually curated; TCGA data; CCLE and Sanger cancer cell line panel data
• A R-interface for computational scientists to build/test predictive models
Predicting Erlotinib or Sorafenib sensitive percentage of
samples to select drug target cancer indications
Cancer
Type
# of
samples
Pred. Erlotinib
Sen. percentage
Pred. Sorafenib
Sen. percentage
FDA approved Erlotinib or Sorafenib indications:
Lung Cancer 329 15.81 0.61
Liver Cancer 85 0.00 31.76
Kidney Cancer 218 0.46 24.77
Some interesting indications predicted to be sensitive to Erlotinib or Sorafenib:
Head and Neck Cancer 168 94.05 12.50
Bladder Cancer 102 41.18 3.92
Haematopoietic Neoplasms 3590 2.17 32.56
Bone Cancer 88 0.00 29.55
Breast Cancer 1668 5.64 6.47
Colorectal Cancer 948 0.11 0.21
Pancreatic Cancer 75 0.00 1.33
Erlotinib was approved in Lung
cancer
Sorafenib was approved in Live and
Kidney cancers
Erlotinib failed in one Kidney
cancer PIII clinical trial
Predicted additional Erlotinib
sensitive cancer indications
Predicted additional Sorafenib
sensitive cancer indications
Bin Li, Hyunjin Shin, et al. PLoS ONE 2015, 10(6): e0130700.
• Goal:
To discover patient selection biomarkers using preclinical data, e.g.
using cell line viability screens
We want to be testing hypothesis in Ph II and using Ph III clinical
trials to select the right patients for treatment
• What we want to gain from building the predictive models
– Build a predictive model on cell line data, which can be used for patients
– Model derived signature genes reflects a drug’s MOA
– Each drug’s predictive model should be drug specific
– Using the predictive model for cancer indication selection
What we want to achieve on the predictive models
√√√√
• Design and implement a predictive modeling framework
• Method evaluation
– Using SOC drugs to evaluate the predictive modeling approach
• Build Erlotinib and Sorafenib sensitivity models using cell line data
• Tested the models using real patients’ output (PFS from BATTLE trials)
• Predict potential drug target cancer indications
– Method comparison on NCI-DREAM challenge
• A fair comparison on predictive modeling methods, used by 47 teams on
22 drugs
Outline
NCI-DREAM consortium testing drug sensitivity
predictive models: 47 teams and 31 drugs
The NCI-DREAM training, testing, and scoring (Nat. Biotech. 2014, 32, 1202-1212):
TMed comp bio team excluded 9 drugs with “flat” IC50 distributions:
Official
Results(Nat. Biotech)
Follow-up
analysis(internal)
TBOS team offically ranked No. 10 among 44 teams
(while we only predicted 22 out of 31 drugs)
TBOS team
teamfin Team_475 Team_690 Team_511 Team_680 Team_425 Team_603
0.2
0.3
0.4
0.5
0.6
0.7
0.8
TMM1_GEP Team_689 Team_691 Team_176 Team_418 Team_474 Team_545
0.2
0.4
0.6
0.8
For Official analysis on 31 drugs: TBOS team ranks No. 10
For the 22 drugs we built predictive models:
TBOS team ranks No. 3 on summarized scores and No. 1 on median scores
• We designed and implemented a predictive modeling framework
• The predictive modeling approach was validated on Erlotinib/Sorafenib
– Cell line data derived models can predict patients’ response
– Model derived signature genes could reflect each drug’s MOA
– Predictive models are drug specific
– The models were also able to predict potential drug target cancer indications
• Method comparison on NCI-DREAM challenge
– Our biomarker method was highly competitive among 47 teams on 22 drugs
Conclusions:
Acknowledgements
• TMed project leadersYuko IshiiBill TrepicchioSteve BlakemoreGeorge MulliganAndy Dorner……
• TMed assay team
Sunita Badola
Scott Verrow
Elena Izmailova
……
• TMed computational team
Gene Shin
Andrew Krueger
Dave Merberg
Lei Shen
……
• Thomas Reuters
Olga Pustovalova
Georgy Gulbekyan
Marina Bessarabova
……