spanish inquisition final project week 4 - 5/21/09 breast cancer gene expression data leon kay, yan...
Post on 21-Dec-2015
217 Views
Preview:
TRANSCRIPT
Spanish Inquisition
Final Project Week 4 - 5/21/09Breast Cancer Gene Expression Data
Leon Kay, Yan Tran, Chris Thomas
Chris
YanLeon
Cluster Analysis - SAM
• Refined Clusters Using TMEV’s SAM Statistical Analysis
• Significance Analysis of Microarrays– determining whether changes in gene expression are
statistically significant. – identifies statistically significant genes by measuring
the strength of the relationship between gene expression and a response variable
MeV SAM Analysis - Results
• Creation of SAM file – Used Excel 2007 to manually create the SAM load file.
• SAM reduces number of genes to 265 significant genes, and 1279 non-significant genes (1544 total genes).
• SAM analysis reduces the number of genes to 17% of the original total.
Kaplan-Meier Survival Analysis
• Used to estimate the overall likelihood of survival, given a set of lifetime data
• Generated using the Excel Plug-in– www.xlstat.com – Thanks Sri!
• A plot of the Kaplan-Meier estimate of the survival function is a series of horizontal steps of declining magnitude which, when a large enough sample is taken, approaches the true survival function for that population.
Survival Analysis – Breast Cancer Type
Survival distribution function - Luminal A
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120
Overall suvival months
Survival distribution function - Basal-like
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80
Overall suvival months
Survival distribution function - Claudin-low
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100
Overall suvival months
Survival distribution function - Luminal B
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100
Overall suvival months
Survival distribution function - HER2+/ER-
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120
Overall suvival months
Survival distribution function - Normal Breast-like
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
Overall suvival months
Survival Analysis - Overall
Survival distribution function
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120
Overall suvival months
-Log(SDF)
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100 120 140
Overall suvival months
Relapse Probability
• 30 out of 270 patients relapsed. Only 270 patients in the clinical data has information recorded one way or the other for relapsing.
• This gives a relapse rate of .1111, or 11.11%• Calculating a 99% confidence interval, we get +/- 0.049. • The final probability of relapse, with 99% certainty,
is .1111 +/- 0.049.• Or, 11.11% +/- 4.9%, for a min and max range of
(6.21%, 16.01%)
Relevance Networks
• The MeV manual states that a “relevance network is a group of genes whose expression profiles are highly predictive of one another.”
• Clusters are represented as genes connected together by lines showing that they are related to each other by a correlation coefficient R2 within preset thresholds.
GATA3
• In week two we mentioned the GATA3 gene
• Linked to the estrogen receptor alpha.
• Method for providing prognosis because the expression profile is very different between Basal-like and Luminal.
• Will GATA3 show-up as a significant gene after post SAM analysis and will we find the gene associated with estrogen receptor alpha with it?
top related