online biomarker validation of survival- associated biomarkers in breast and ovarian cancer using...

1
AIMS The pre-clinical validation of prognostic gene candidates in large independent patient cohorts is a pre- requisite for the development of robust biomarkers. In present study we expanded our online Kaplan-Meier plotter tool to assess the effect of genes on ovarian cancer prognosis. CONCLUSIONS We extended our global biomarker validation platform to assess the prognostic power of 22,277 genes in 2,977 breast and 1,346 ovarian cancer patients. Online access at: http://www.kmplot.com/ . METHODS Gene expression data and survival information of breast and ovarian cancer patients were downloaded from GEO and TCGA. To analyze the prognostic value of the selected gene in the various cohorts the patients are divided into two groups according to the quantile expression of the gene. Filtering is implemented for stage, grade, and histology subtypes. Follow-up threshold is implemented to exclude long-term effects. A Kaplan-Meier survival plot is generated and significance is computed in the R statistical environment using Bioconductor packages. The combination of several probe sets can be employed to assess the mean of their expression as a multigene predictor of survival. RESULTS All together 1,346 ovarian cancer patients and 2,977 breast cancer patients were entered into the database. These groups can be compared using relapse free survival or overall survival. We used this integrative data analysis tool to validate the prognostic power of 37 biomarkers identified in the literature. Of these, CA125 (p=3.7e-5, HR=1.4), CDKN1B (p=5.4e-5, HR=1.4), KLK6 (p=0.002,HR=0.79), IFNG (p=0.004, HR=0.81), P16 (p=0.02, HR=0.66) and BIRC5 (p=0.00017, HR=0.75) were associated with survival. Analysis at www.kmplot.com Raw data n=5,032 PostgreSQL database Remaining n=4,323 Clinical data Real time computation in R Graphical feedback (KM-plot, hazard ratio and p-value) Filtering for gene expression 1. Quality control 2. Normalization 3. Combination of platforms GEO, TCGA TOP2A in breast cancer CA125 in ovarian cancer Distribution of CA125 Figure 1. The online query pages Figure 2. Overview of the system Symbol Sur v. Analyzed in: Affymetri x ID HR p CA125 PFS All patients 220196_at n.s. n.s. 201384_s_at 1.3 0.0003* 201383_s_at 1.4 3.7e-05* KRT19 PFS Debulk = subopt. 201650_at n.s. n.s. KLK6 PFS All patients 216699_s_at 0.79 0.002* 204733_at n.s. n.s. KLK10 PFS Stage = 3+4 209792_s_at n.s. n.s. IL6 OS All patients 205207_at n.s. n.s. FAS PFS All patients 204780_s_at 1.2 0.017 204781_s_at n.s. n.s. 212218_s_at 0.84 0.024 215719_x_at n.s. n.s. 216252_x_at n.s. n.s. VEGFR OS All patients 203934_at 1.2 0.064 CCND1 OS Stage = 3+4 208711_s_at n.s. n.s. 208712_at n.s. n.s. CCND3 OS All patients 201700_at n.s. n.s. CCNE OS Debulk = subopt. 213523_at n.s. n.s. 205034_at n.s. n.s. P15 PFS All patients 204599_s_at n.s. n.s. 212857_x_at 1.3 0.0005* 214512_s_at 1.2 0.01 218708_at n.s. n.s. P16 PFS Debulk = subopt. 207039_at 0.66 0.002* 209644_x_at n.s. n.s. CDKN1A PFS Histology = serous 202284_s_at n.s. n.s. CDKN1B PFS All patients 209112_at 1.4 5.4e-05* RB1 OS Stage = 1 203132_at n.s. n.s. E2F1 PFS All patients 2028_s_at 0.83 0.017 E2F4 PFS All patients 38707_r_at n.s. n.s. TP53 PFS Stage = 3+4 211300_s_at n.s. n.s. 201746_at 0.84 0.075 BAX PFS Therapy = contains Taxol 208478_s_at n.s. n.s. 211833_s_at n.s. n.s. BCL2L1 PFS All patients 212312_at 0.86 0.04 215037_s_at n.s. n.s. BIRC2 OS Stage = 3+4 202076_at n.s. n.s. BIRC5 PFS All patients 210334_x_at 0.75 0.00017* 202094_at 0.84 0.018 202095_s_at 0.84 0.018 EGFR PFS Stage = 1+2 201983_s_at n.s. n.s. 201984_s_at n.s. n.s. 211551_at n.s. n.s. ERBB2 PFS Histology = serous 216836_s_at n.s. n.s. MET OS Stage = 3+4 217828_at n.s. n.s. 203510_at n.s. n.s. 211599_x_at n.s. n.s. 213807_x_at n.s. n.s. MMP2 PFS Histology = endom. 201069_at 0.33 0.05 MMP9 OS Stage = 1 203936_s_at n.s. n.s. MMP14 OS Stage = 2+3+4 160020_at n.s. n.s. 202828_s_at n.s. n.s. 202827_s_at n.s. n.s. HE4 PFS All patients 203892_at n.s. n.s. SERPINB5 PFS Debulk = subopt. 204855_at n.s. n.s. BRCA1 OS All patients 204531_s_at n.s. n.s. ERCC1 PFS Stage = 3 Therapy=Tax+Pl at 203719_at n.s. n.s. 203720_s_at n.s. n.s. Table 1. The association between prognostic markers and survival. The markers were analyzed in subsets of patients with equivalent clinical characteristics to the cohorts in which the association has previously been described. GRANT SUPPORT: OTKA PD 83154; TAMOP-4.2.1.B-09/1/KMR-2010- 0001; The PREDICT consortium (EU grant no. 259303)

Upload: loren-flowers

Post on 28-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ONLINE BIOMARKER VALIDATION OF SURVIVAL- ASSOCIATED BIOMARKERS IN BREAST AND OVARIAN CANCER USING MICROARRAY DATA OF 3,862 4,323 PATIENTS Balázs Győrffy

AIMSThe pre-clinical validation of prognostic gene candidates in large independent patient cohorts is a pre-requisite for the development of robust biomarkers. In present study we expanded our online Kaplan-Meier plotter tool to assess the effect of genes on ovarian cancer prognosis.

CONCLUSIONSWe extended our global biomarker validation platform to assess the prognostic power of

22,277 genes in 2,977 breast and 1,346 ovarian cancer patients.

Online access at: http://www.kmplot.com/.

METHODSGene expression data and survival information of breast and ovarian cancer patients were downloaded from GEO and TCGA. To analyze the prognostic value of the selected gene in the various cohorts the patients are divided into two groups according to the quantile expression of the gene. Filtering is implemented for stage, grade, and histology subtypes. Follow-up threshold is implemented to exclude long-term effects. A Kaplan-Meier survival plot is generated and significance is computed in the R statistical environment using Bioconductor packages. The combination of several probe sets can be employed to assess the mean of their expression as a multigene predictor of survival.

RESULTSAll together 1,346 ovarian cancer patients and 2,977 breast cancer patients were entered into the database. These groups can be compared using relapse free survival or overall survival. We used this integrative data analysis tool to validate the prognostic power of 37 biomarkers identified in the literature. Of these, CA125 (p=3.7e-5, HR=1.4), CDKN1B (p=5.4e-5, HR=1.4), KLK6 (p=0.002,HR=0.79), IFNG (p=0.004, HR=0.81), P16 (p=0.02, HR=0.66) and BIRC5 (p=0.00017, HR=0.75) were associated with survival.

Analysis atwww.kmplot.com

Analysis atwww.kmplot.com

Raw datan=5,032

Raw datan=5,032

PostgreSQL database

PostgreSQL database

Remaining n=4,323Remaining n=4,323

Clinical data

Clinical data

Real time computation in R

Real time computation in R

Graphical feedback (KM-plot, hazard ratio and p-value)

Graphical feedback (KM-plot, hazard ratio and p-value)

Filtering for gene expression

Filtering for gene expression

1. Quality control 2. Normalization3. Combination of platforms

1. Quality control 2. Normalization3. Combination of platforms

GEO, TCGAGEO, TCGA

TOP2A in breast cancer

CA125 in ovarian cancer Distribution of CA125

Figure 1. The online query pages Figure 2. Overview of the system

Symbol Surv.

Analyzed in: Affymetrix ID HR p

CA125 PFS All patients 220196_at n.s. n.s.201384_s_at 1.3 0.0003*201383_s_at 1.4 3.7e-05*

KRT19 PFS Debulk = subopt. 201650_at n.s. n.s.KLK6 PFS All patients 216699_s_at 0.79 0.002*

204733_at n.s. n.s.KLK10 PFS Stage = 3+4 209792_s_at n.s. n.s.IL6 OS All patients 205207_at n.s. n.s.FAS PFS All patients 204780_s_at 1.2 0.017

204781_s_at n.s. n.s.212218_s_at 0.84 0.024215719_x_at n.s. n.s.216252_x_at n.s. n.s.

VEGFR OS All patients 203934_at 1.2 0.064CCND1 OS Stage = 3+4 208711_s_at n.s. n.s.

208712_at n.s. n.s.CCND3 OS All patients 201700_at n.s. n.s.CCNE OS Debulk = subopt. 213523_at n.s. n.s.

205034_at n.s. n.s.P15 PFS All patients 204599_s_at n.s. n.s.

212857_x_at 1.3 0.0005*214512_s_at 1.2 0.01218708_at n.s. n.s.

P16 PFS Debulk = subopt. 207039_at 0.66 0.002*209644_x_at n.s. n.s.

CDKN1A PFS Histology = serous 202284_s_at n.s. n.s.CDKN1B PFS All patients 209112_at 1.4 5.4e-05*RB1 OS Stage = 1 203132_at n.s. n.s.E2F1 PFS All patients 2028_s_at 0.83 0.017E2F4 PFS All patients 38707_r_at n.s. n.s.TP53 PFS Stage = 3+4 211300_s_at n.s. n.s.

201746_at 0.84 0.075BAX PFS Therapy = contains

Taxol208478_s_at n.s. n.s.211833_s_at n.s. n.s.

BCL2L1 PFS All patients 212312_at 0.86 0.04215037_s_at n.s. n.s.

BIRC2 OS Stage = 3+4 202076_at n.s. n.s.BIRC5 PFS All patients 210334_x_at 0.75 0.00017*

202094_at 0.84 0.018202095_s_at 0.84 0.018

EGFR PFS Stage = 1+2 201983_s_at n.s. n.s.201984_s_at n.s. n.s.211551_at n.s. n.s.

ERBB2 PFS Histology = serous 216836_s_at n.s. n.s.MET OS Stage = 3+4 217828_at n.s. n.s.

203510_at n.s. n.s.211599_x_at n.s. n.s.213807_x_at n.s. n.s.

MMP2 PFS Histology = endom. 201069_at 0.33 0.05MMP9 OS Stage = 1 203936_s_at n.s. n.s.MMP14 OS Stage = 2+3+4 160020_at n.s. n.s.

202828_s_at n.s. n.s.202827_s_at n.s. n.s.

HE4 PFS All patients 203892_at n.s. n.s.SERPINB5 PFS Debulk = subopt. 204855_at n.s. n.s.BRCA1 OS All patients 204531_s_at n.s. n.s.ERCC1 PFS Stage = 3

Therapy=Tax+Plat203719_at n.s. n.s.

203720_s_at n.s. n.s.

Table 1. The association between

prognostic markers and survival. The markers were analyzed in

subsets of patients with equivalent clinical characteristics to the

cohorts in which the association has previously been described.

GRANT SUPPORT: OTKA PD 83154; TAMOP-4.2.1.B-09/1/KMR-2010-0001; The PREDICT consortium (EU grant no. 259303)