computational diagnostics based on large scale gene expression profiles using mcmc rainer spang,
DESCRIPTION
Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/1.jpg)
Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC
Rainer Spang,
Max Planck Institute for Molecular Genetics, Berlin
Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West
Duke Medical Center & Duke University
![Page 2: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/2.jpg)
Estrogen Receptor Status
• 7000 genes• 49 breast tumors• 25 ER+• 24 ER-
![Page 3: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/3.jpg)
Tumor – Chip - 7000 Numbers
![Page 4: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/4.jpg)
Given
7000 Numbers
Wanted
89%
The probability that the tumor is ER+
![Page 5: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/5.jpg)
7000 Numbers Are More Numbers Than We Need
Predict ER status based on the expression levels of super-genes
![Page 6: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/6.jpg)
Singular Value Decomposition
X
FDAE
Data
Loadings Singular values
Expression levels of super genes, orthogonal
matrix
![Page 7: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/7.jpg)
)(genessuper all
|1 0][ ii i x βYP
i
i
i
x
Y
Probit Model
Class of tumor i
Distribution Function of a Standard NormalRegression weight for super gene i
Expression Level of super gene i
![Page 8: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/8.jpg)
Overfitting
• Using only a small number of super genes is not robust at all
• When using many (all) supergenes, the linear model can be easily saturated, i.e. we have several models that fit perfectly well
• Consequence: For a new patient we find among these models some that support that she is ER+ and others that predict she is ER-
![Page 9: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/9.jpg)
Given the Few Profiles With Known Diagnosis:
• The uncertainty on the right model is high
• The variance of the model-weights is large
• The likelihood landscape is flat• We need additional model
assumptions to solve the problem
![Page 10: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/10.jpg)
Informative Priors
Likelihood Prior Posterior
![Page 11: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/11.jpg)
If the Prior Is Chosen Badly:
• We can not reproduce the diagnosis of the training profiles any more
• We still can not identify the model• The diagnosis is driven mostly by
the additional assumptions and not by the data
![Page 12: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/12.jpg)
The Prior Needs to Be designed in 49 Dimensions
• Shape?• Center?• Orientation?• Not to narrow ... not to wide
![Page 13: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/13.jpg)
Shape
multidimensional normal
for simplicity
![Page 14: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/14.jpg)
Center
Assumptions on the model correspond to assumptions on the
diagnosis
]|1[ ii YP
![Page 15: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/15.jpg)
Orientation
orthogonal super-genes !
![Page 16: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/16.jpg)
Not to Narrow ... Not to Wide
Auto adjusting model
Scales are hyper parameters with their own priors
![Page 17: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/17.jpg)
)/,0|()|( 22
1ii
n
ii dNTp
Prior given the hyper parameter
Hyper parameter
Independent super genes
Unbiased prior
Rescaling by singular values
![Page 18: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/18.jpg)
A prior for the hyper parameters
)2/,2/(~2 kkGammai
-Conjugate prior
-Flexibility for
-Symmetric U-Shaped prior for
i
k=2 or k=3
]|1[ ii YP
![Page 19: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/19.jpg)
Latent Variable
iii xh 0 )1,0(~ N
01 ii hY
)(genessuper all
i0 β |1 ][ ii xYP
Albert & Chip 1993
![Page 20: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/20.jpg)
MCMC
- Gibbs Sampler
- Sequential updates of conditional distributions
normal truncated~),,|(
gamma~),,|(
normal~),,|(
TXhp
hXTp
ThXp
All conditional posteriors can be calculated analytically
West 2001, Albert & Chip 1993
![Page 21: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/21.jpg)
What are the additional assumptions
that came in by the prior?
• The model can not be dominated by only a few super-genes ( genes! )
• The diagnosis is done based on global changes in the expression profiles influenced by many genes
• The assumptions are neutral with respect to the individual diagnosis
![Page 22: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/22.jpg)
![Page 23: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/23.jpg)
Which Genes Have Driven the Prediction ?
Gene Weight
nuclear factor 3 alpha 0.853
cysteine rich heart protein 0.842
estrogen receptor 0.840
intestinal trefoil factor 0.840
x box binding protein 1 0.835
gata 3 0.818
ps 2 0.818
liv1 0.812
... many many more ... ...
![Page 24: Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,](https://reader031.vdocuments.us/reader031/viewer/2022020417/568137d3550346895d9f7558/html5/thumbnails/24.jpg)
Thank you!