1 harvard medical schoolmassachusetts institute of technology identifying differentially expressed...

10
1 Harvard Medical School Massachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1 Hsun-Hsien Chang 2 Marco F. Ramoni 2 1 Department of Mathematics, MIT 2 Division of Health Sciences and Technology, Harvard-MIT New England Statistics Symposium April 17, 2010

Upload: chad-powers

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

1

Harvard Medical SchoolMassachusetts Institute of Technology

Identifying Differentially Expressed Genes in Time Series Microarrays

Jonathan J. Smith1

Hsun-Hsien Chang2

Marco F. Ramoni2

1Department of Mathematics, MIT2Division of Health Sciences and Technology,

Harvard-MIT

New England Statistics SymposiumApril 17, 2010

Page 2: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

2

Harvard Medical SchoolMassachusetts Institute of Technology

Background

• Microarray technologies enable profiling expression of thousands of genes in parallel on a single chip.

• Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis.– Identify differentially expressed genes across tissue states,

using t-statistics, fold-change, signal-to-noise ratio, principal component analysis, etc.

• Research trend: – Microarray technologies are cost down.– Collect times series gene expression microarrays to study

biological functions.

Page 3: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

3

Harvard Medical SchoolMassachusetts Institute of Technology

Approach

• Challenge: – Existing methods (t-statistics, fold-change, SNR, PCA)

cannot be extended to longitudinal expression analysis because temporal information is not well represented.

• Propose to use the framework of Bayesian networks to capture both the functional and temporal dependencies.

Page 4: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

4

Harvard Medical SchoolMassachusetts Institute of Technology

Bayesian Networks

• Bayesian networks are directed acyclic graphs where:– Node corresponds to random variables.– Directed arcs encode conditional probabilities of the target

nodes on the source nodes.

Page 5: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

5

Harvard Medical SchoolMassachusetts Institute of Technology

Representation of Functional DependenceCa

se 1

Case

2 . . . .

Tissue state 1

Case

M

Tissue state 2 Phenotypes are modeled by a binomial variable.

GPheno The gene is independent of the phenotypes.

Gene G Expression of human subjects is modeled by a log-normal variable.

Pheno

G

GPheno The gene is dependent on the phenotypes.

Page 6: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

6

Harvard Medical SchoolMassachusetts Institute of Technology

Representation of Temporal Dependence

Case

1

Case

2 . . . .

Tissue state 1

Case

M

Tissue state 2

G(1)

G(2)

G(T)

The time series expression of gene G is considered a 1st order Markov chain.

G(1) G(2) G(T)

Page 7: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

7

Harvard Medical SchoolMassachusetts Institute of Technology

Differentially Expressed Time Series

G(1) G(2) G(T)Pheno

The expression series is independent of the phenotypes.

G(1) G(2) G(T)Pheno

The expression series is dependent on the phenotypes.

G(1) G(2) G(T)PhenoPhenotype variable modulates gene expression at every time point.

Page 8: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

8

Harvard Medical SchoolMassachusetts Institute of Technology

p( | Data )

Identify Function-Dependent Genes

p( | Data )Bayes Factor =

p( Data | )

p( Data | )

Page 9: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

9

Harvard Medical SchoolMassachusetts Institute of Technology

Clinical Study on Breast Cancer

• Breast cancer is the most prevalent cancer in women. Identification of genes inducing breast cancer will help drug development.

• We used breast cancer microarray data from Gene Expression Omnibus (accession number GSE11352).

• Our method identified 40 genes that may drive breast cancer development.

• Biologists confirmed that these genes are involved in cell death, developmental disorder, and endocrine system disorder (all are prerequisites of breast cancer).

Page 10: 1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1

10

Harvard Medical SchoolMassachusetts Institute of Technology

Conclusion

• Develop a Bayesian network method for identification of genes in longitudinal expression microarray data.– Functional dependence: genes modulated by

phenotypes.– Temporal dependence: gene expression time

series modeled by 1st order Markov chain.– Use Bayes factor to select differentially

expressed genes.