a quantitative overview to gene expression profiling in animal genetics

19
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb. 20 Analysis of (cDNA) Microarray Data: Part V. Mixtures of Distributions Model-Based Clustering via Mixtures of Distribution

Upload: diana-schultz

Post on 13-Mar-2016

30 views

Category:

Documents


2 download

DESCRIPTION

A Quantitative Overview to Gene Expression Profiling in Animal Genetics. Analysis of (cDNA) Microarray Data : Part V. Mixtures of Distributions Model-Based Clustering via Mixtures of Distribution. Armidale Animal Breeding Summer Course, UNE, Feb. 2006. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Analysis of(cDNA) Microarray Data:

Part V. Mixtures of Distributions

Model-Based Clusteringvia

Mixtures of Distribution

Page 2: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Definition

• The mixture model assumes that each cluster (or component) of the data is generated by an underlying normal distribution.

• Each of the data in y are assumed to be independent observations from a mixture density with k (possibly unknown but finite) components and with probability density function:

k

iiiik Vyyf

1

,;;

Mixing proportions (add to 1)

Normal density function

Page 3: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Introduction

k

jjjjk Vyyf

1

,;;

Page 4: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of DistributionsThe Guru http://www.maths.uq.edu.au/~gjm

Page 5: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Software and Resources

Page 6: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

EM Algorithm

k

iiiik Vyyf

1

,;;

The EM algorithm obtains the maximum likelihood estimate of by iteration. In the (m+1)th iteration, the estimates of the parameters of interest are updated by:

n

j

mij

mi n

1

)()1( /

n

j

n

j

miji

mij

mi y

1 1

)()()1( /

n

j

mij

Tmii

mii

n

j

mij

mi yyV

1

)()1()1(

1

)()1( /))((

);(/,; )()()()()( mj

mi

mij

mi

mij yfVy Where

Is the Posterior Probability that yj belongs to the i-th component of the mixture (…with a very elegant link to False Discovery Rate).

Page 7: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

EM Algorithm

• We proceed for k = 1, 2, 3, …, and so on components.• Criteria for model selection includes the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC):

kkLAIC 2)ˆ(log2

)log()ˆ(log2 nLBIC kk

Where 13 kk Is the number of independent parameters in the mixture.

• Alternatively, the distribution of the likelihood ratio test (LRT) can be estimated by bootstrapping and P-values obtained to contrast a model with k components against a model with k + 1 components.

k

iiiik Vyyf

1

,;;

Page 8: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Simulation 1Consider theseDistribution

N(1,5)N(5,10)

Records

10,000 5,000…and simulate

)10,5(31)5,1(3

2)ˆ;( NNyf The Mixture becomes:

);(,;

j

iijiij yf

Vy Posterior Prob:

LikelihoodN(1,5) N(5,10)

-1 0.120 0.021 0 0.161 4 0.036 1 0.178 0.056 5 0.036 3 0.126 7 0.005 0.103

Weighted average (by mixing proportions)

62

Page 9: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Simulation 2Consider theseDistribution

N(0,1)N(0,10)

Records

9,000 1,000…and simulate

)10,0(1.0)1,0(9.0)ˆ;( NNyf The Mixture becomes:

Microarray

Non-DE Genes DE Genes

Page 10: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of DistributionsSimulation 2 )10,0(1.0)1,0(9.0)ˆ;( NNyf

2. Ask EMMIX to fit mixtures with up to 5 components and…

)805.10,010.0(097.0)993.0,006.0(903.0)ˆ;( NNyf

3. EMMIX model of best fit:

1. Simulate:

Page 11: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of DistributionsSimulation 2 )10,0(1.0)1,0(9.0)ˆ;( NNyf

)805.10,010.0(097.0)993.0,006.0(903.0)ˆ;( NNyf3. EMMIX best fit:

1. Simulate:

Frequency Post Prob

Posterior Probabilities are “Decision Function” changing at 2.75

Page 12: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of DistributionsLinking Posterior Probabilities with False Discovery Rate

Page 13: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of DistributionsLinking Posterior Probabilities with False Discovery Rate

Not-DE DESelect the N most extreme genes, and FDR is the average posterior probability of not being in the cluster of extremes.

Page 14: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Simulation 2 )10,0(1.0)1,0(9.0)ˆ;( NNyf

)805.10,010.0(097.0)993.0,006.0(903.0)ˆ;( NNyf3. EMMIX best fit:

1. Simulate:

Post Prob

Select the N most extreme genes, and FDR is the average Post Prob of not being in the cluster of extremes.

FDR by N Genes

Page 15: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Example“Diets”

(only REFERENCE components of the design)

88ii rg

iiHvLi

rgy

Page 16: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Example“Diets”

(only REFERENCE components of the design)

88ii rg

iiHvLi

rgy

Page 17: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Example“Diets”

(only REFERENCE components of the design)

88ii rg

iiHvLi

rgy

k

iiiik Vyyf

1

,;;

)32.2,41.2(366.0)42.10,30.2(590.0)46.67,87.0(044.0)ˆ;(

NNNyf

Page 18: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Example

,)32.2,41.2(366.0)42.10,30.2(590.0)46.67,87.0(044.0)ˆ;( NNNyf

“Diets”(only REFERENCE components of the design)

Page 19: A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

Armidale Animal Breeding Summer Course, UNE, Feb. 2006

Mixtures of Distributions

Example

,)32.2,41.2(366.0)42.10,30.2(590.0)46.67,87.0(044.0)ˆ;( NNNyf

“Diets”(only REFERENCE components of the design)

FDR by N Genes

In Reverter et al. ‘03 (JAS 81:1900), 27 genes were reported as having a PP > 0.95 of being in the extreme cluster.

Now, we can assess that these 27 genes include a FDR < 10%.