1 harvard medical schoolmassachusetts institute of technology identifying differentially expressed...
TRANSCRIPT
![Page 1: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/1.jpg)
1
Harvard Medical SchoolMassachusetts Institute of Technology
Identifying Differentially Expressed Genes in Time Series Microarrays
Jonathan J. Smith1
Hsun-Hsien Chang2
Marco F. Ramoni2
1Department of Mathematics, MIT2Division of Health Sciences and Technology,
Harvard-MIT
New England Statistics SymposiumApril 17, 2010
![Page 2: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/2.jpg)
2
Harvard Medical SchoolMassachusetts Institute of Technology
Background
• Microarray technologies enable profiling expression of thousands of genes in parallel on a single chip.
• Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis.– Identify differentially expressed genes across tissue states,
using t-statistics, fold-change, signal-to-noise ratio, principal component analysis, etc.
• Research trend: – Microarray technologies are cost down.– Collect times series gene expression microarrays to study
biological functions.
![Page 3: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/3.jpg)
3
Harvard Medical SchoolMassachusetts Institute of Technology
Approach
• Challenge: – Existing methods (t-statistics, fold-change, SNR, PCA)
cannot be extended to longitudinal expression analysis because temporal information is not well represented.
• Propose to use the framework of Bayesian networks to capture both the functional and temporal dependencies.
![Page 4: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/4.jpg)
4
Harvard Medical SchoolMassachusetts Institute of Technology
Bayesian Networks
• Bayesian networks are directed acyclic graphs where:– Node corresponds to random variables.– Directed arcs encode conditional probabilities of the target
nodes on the source nodes.
![Page 5: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/5.jpg)
5
Harvard Medical SchoolMassachusetts Institute of Technology
Representation of Functional DependenceCa
se 1
Case
2 . . . .
Tissue state 1
Case
M
Tissue state 2 Phenotypes are modeled by a binomial variable.
GPheno The gene is independent of the phenotypes.
Gene G Expression of human subjects is modeled by a log-normal variable.
Pheno
G
GPheno The gene is dependent on the phenotypes.
![Page 6: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/6.jpg)
6
Harvard Medical SchoolMassachusetts Institute of Technology
Representation of Temporal Dependence
Case
1
Case
2 . . . .
Tissue state 1
Case
M
Tissue state 2
G(1)
G(2)
G(T)
The time series expression of gene G is considered a 1st order Markov chain.
G(1) G(2) G(T)
![Page 7: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/7.jpg)
7
Harvard Medical SchoolMassachusetts Institute of Technology
Differentially Expressed Time Series
G(1) G(2) G(T)Pheno
The expression series is independent of the phenotypes.
G(1) G(2) G(T)Pheno
The expression series is dependent on the phenotypes.
G(1) G(2) G(T)PhenoPhenotype variable modulates gene expression at every time point.
![Page 8: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/8.jpg)
8
Harvard Medical SchoolMassachusetts Institute of Technology
p( | Data )
Identify Function-Dependent Genes
p( | Data )Bayes Factor =
p( Data | )
p( Data | )
![Page 9: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/9.jpg)
9
Harvard Medical SchoolMassachusetts Institute of Technology
Clinical Study on Breast Cancer
• Breast cancer is the most prevalent cancer in women. Identification of genes inducing breast cancer will help drug development.
• We used breast cancer microarray data from Gene Expression Omnibus (accession number GSE11352).
• Our method identified 40 genes that may drive breast cancer development.
• Biologists confirmed that these genes are involved in cell death, developmental disorder, and endocrine system disorder (all are prerequisites of breast cancer).
![Page 10: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d1f5503460f949f2c24/html5/thumbnails/10.jpg)
10
Harvard Medical SchoolMassachusetts Institute of Technology
Conclusion
• Develop a Bayesian network method for identification of genes in longitudinal expression microarray data.– Functional dependence: genes modulated by
phenotypes.– Temporal dependence: gene expression time
series modeled by 1st order Markov chain.– Use Bayes factor to select differentially
expressed genes.