expression and methylation patterns partition luminal -a...
TRANSCRIPT
Expression and Methylation Patterns Partition Luminal-A Breast Tumors Into Distinct
Prognostic Subgroups
Dvir Netanely , Ayelet Avraham , Adit Ben-Baruch , Ella Evron and Ron Shamir
Breast Cancer Res. 2016 Jul 7;18(1):74.
Presentation by Nimrod Rappoport, “Towards the Precision Medicine Era: Computational challenges”
seminar, headed by Prof. Ron Shamir November 15, 2016
Introduction
• Improved breast cancer clustering using new expression and methylation datasets
• Use of clustering and several classic statistic tests
Expression Methylation
Cancer Subtypes
Background
• Cancer: group of diseases involving abnormal cell growth • Breast Cancer: cancer that develops from breast tissue • Most common invasive cancer in women (affects 1 in 8) • Almost half a million deaths a year
Background
• Heterogeneous disease • Therapeutic decisions based on pathologic parameters and
biomarkers
Estrogen and Progesteron receptors HER2 overexpression
Gene Expession
• RNA-Seq and microarrays
Background
• Microarray sequencing found clusters corresponding to biomarkers
• PAM50 • Attempts to cluster using different data
DNA Methylation
• Methyl group added to Cytosine (typically CpG) • Can be measured – frequency of dinucleotide methylation in tissue • Methylation of promotors typically represses gene transcription • Both hyper methylation and hypo methylation observed in cancer
Background
• Dataset by The Cancer Genome Atlas (TCGA) • Gene expression (RNA-Seq) - 1148 samples of ~20500 genes • Methylation - 679 samples of 107639 probes • Methylation samples contained in gene expression samples • PAM50 classification
Background
• Improved breast cancer clustering using new expression and methylation datasets
• Use of clustering and several classic statistic tests
Expression Methylation
Cancer Subtypes
Clustering
• Grouping objects such that objects in same group are similar • Applications in medicine, economics, politics • Overlapping vs. strict partitioning
Clustering
K Means
• Choose k initial centroids • Assign each vector to the
cluster of its closest centroid • Calculate new centroids as the
mean of vectors in that cluster • Repeat until convergence •
K Means
Clustering Results
• Clusters fit PAM50 except Luminal-A which has a mixed cluster with Luminal-B.
Luminal Clustering
• Only Luminal data is clustered with K=2.
• How can we compare which clustering better predicts prognosis?
Introduction to Survival Analysis
tttTtPtf
t ∆∆+<≤
=→∆
)(lim)(0
)(1)(1)( tFtTPtS −=≤−=
Introduction to Survival Analysis
tttTtPtf
t ∆∆+<≤
=→∆
)(lim)(0
)(1)(1)( tFtTPtS −=≤−=
ttTttTtPth
t ∆≥∆+<≤
=→∆
)/(lim)(0 AGES
)()(
)()(
)()&()/()(
tSdttf
tTPdttTtP
tTPtTdttTtPtTdttTtPdtth =
≥+<≤
=≥
≥+<≤=≥+<≤=
Kaplan Meier
Luminal Clustering
P value
• Hypothesis testing • Null hypothesis • Probability that the sample would be the same as or more extreme
than the observed data using the null hypothesis • p = 0.05 or 0.01 are common cutoffs for statistical significance
Log Rank Test
• Nonparametric test • Compares survival distributions • Supports censoring • Confusing name
Log Rank Test
Log Rank Test
• K tables for different event times
• Symmetry • Best when hazards have constant ratio • Weighted logrank, stratified logrank and multiple variable
Luminal Clustering
• Clustering has better prognostic value than PAM50!
Luminal-A Clustering
• Reduced 5-year recurrence for R2!
Chi Square
• Tests the null hypothesis that observations are statistically independent
• (Log rank?) • Microarray data fits RNA Seq
Wilcoxon Rank Sum Test
Wilcoxon Rank Sum Test
False Discovery Rate
False Discovery Rate
Luminal A Clustering
• LumA-R1 associated with higher proliferation score, older age, decrease in normal cell percent, increase in tumor nuclei percent
• R1 and R2 correlate with ductal and lobular histological types
• Fit between Luminal-A and Luminal clusters
Gene Enrichment for Luminal A
• All 1000 most differentially expressed genes are overexpressed in R2 • LumA-R2 overexpresses genes related to the immune system. • Overexpression of both chemokines and their receptors may indicate
increased infiltration of immune system cells. • It is unclear whether this is related to reduced
tumor recurrence.
Methylation Clustering
Luminal Methylation Clustering
• LumA-M1 associated with poor 5-year prognosis! • Correlation with expression partitioning (chi square)
Methylation Cluster Analysis
• Comparing the LumA-M1 and LumA-M3 clusters • Finding differentially methylated CpGs
• Top 1000 hypermethylated CpGs are in LumA-M1
• Finding differentially methylated CpGs with over-expressed genes • Finding differentially methylated CpGs with under-expressed genes • Enrichment for differentially methylated CpGs and their genes
Spearman Correlation
Methylation and Expression Correlation
Positive correlation with expression (212 CpGs, 125 genes)
Negative correlation with expression (586 CpGs, 340 genes)
Hyper methylated CpGs (1000 CpGs, 483 genes)
Development, homeobox Tumor suppressing genes Development, homeobox, tumor suppressing genes
Gene enrichment
Gene body, under representation of TSS
TSS and 1st exon, under representation of gene body, promoter associated cell type specific
Enhancer, tissue-specific promoter, cancer DMR
CpG enrichment
Cox Survival Analysis
ttTttTtPth
t ∆≥∆+<≤
=→∆
)/(lim)(0
)()(
)()(
)()&()/()(
tSdttf
tTPdttTtP
tTPtTdttTtPtTdttTtPdtth =
≥+<≤
=≥
≥+<≤=≥+<≤=
Cox Survival Analysis
Cox Survival Analysis
• Event times: 1, 3, 4, 10+, 12, 18
))18()18(
())12()12(
)12((
))4(....)4(
)4(()
)3()3()3()3()3()3(
(
))1()1()1()1()1()1(
)1(()β(
6
6
65
5
63
3
65432
2
654321
1
1
hh
xhh
hx
hhh
xhhhhh
h
xhhhhhh
hLL
m
iip
+
++++++
+++++==∏
=
1 ....)()(654321
1
1
xxeeeee
eLLm
iip βxβxβxβxβxβx
βx
β+++++
==∴ ∏=
Cox Survival Analysis
])log([)(log)(1
∑∑∈=
−=∴i
j
tRjj
m
ijp eL βxβxβ δ
)ˆ(error standard asymptotic0ˆ
ββ −
=Z
Cox Survival Analysis Results
Summary
K means Clustering gene expression data
Kaplan Meier plots, log rank test Comparing prognostic value to PAM50
Chi-square Comparing RNA-Seq clustering with microarray clustering
Rank sum test, false discovery rate Finding differentially expressed genes Hypergeometric test, false discovery rate LumA-R1 and LumA-R2 enrichment
K means Clustering methylation data Kaplan Meier plots, log rank test Comparing prognosis of different clusters Rank sum test, false discovery rate, Spearman test
Finding differentially methylated CpGs and correlation with gene expression
Hypergeometric test, false discovery rate LumA-M1 and LumA-M3 enrichment
Cox Survival Analysis Adjusting prognostic value for other variables
Summary
• Clustering shows better 5-years prognostic value than PAM50 clustering
• Expression data reveals subgroup with lower 5-years recurrence rates and higher expression of immune-related genes
• Methylation data reveals subgroup of hyper methylation of developmental genes and poor survival
• Prognostic significance of partitioning confirmed by Cox analysis
Discussion
• Use of new clustering to make therapeutic decisions: • Luminal-A Luminal-B partition • hypermethylation in Luminal-A
• Overexpression of chemokines and their receptors may indicate infiltration of immune system cells
• Understanding this mechanism may lead to a specific treatment
Discussion
• Use this approach for additional subtype clustering • Take DNA Sequencing into account – expression correlation with
mutation • Use data taken over long periods of time