a quantitative overview to gene expression profiling in animal genetics
DESCRIPTION
A Quantitative Overview to Gene Expression Profiling in Animal Genetics. Analysis of (cDNA) Microarray Data : Part V. Mixtures of Distributions Model-Based Clustering via Mixtures of Distribution. Armidale Animal Breeding Summer Course, UNE, Feb. 2006. - PowerPoint PPT PresentationTRANSCRIPT
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Analysis of(cDNA) Microarray Data:
Part V. Mixtures of Distributions
Model-Based Clusteringvia
Mixtures of Distribution
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Definition
• The mixture model assumes that each cluster (or component) of the data is generated by an underlying normal distribution.
• Each of the data in y are assumed to be independent observations from a mixture density with k (possibly unknown but finite) components and with probability density function:
k
iiiik Vyyf
1
,;;
Mixing proportions (add to 1)
Normal density function
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Introduction
k
jjjjk Vyyf
1
,;;
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of DistributionsThe Guru http://www.maths.uq.edu.au/~gjm
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Software and Resources
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
EM Algorithm
k
iiiik Vyyf
1
,;;
The EM algorithm obtains the maximum likelihood estimate of by iteration. In the (m+1)th iteration, the estimates of the parameters of interest are updated by:
n
j
mij
mi n
1
)()1( /
n
j
n
j
miji
mij
mi y
1 1
)()()1( /
n
j
mij
Tmii
mii
n
j
mij
mi yyV
1
)()1()1(
1
)()1( /))((
);(/,; )()()()()( mj
mi
mij
mi
mij yfVy Where
Is the Posterior Probability that yj belongs to the i-th component of the mixture (…with a very elegant link to False Discovery Rate).
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
EM Algorithm
• We proceed for k = 1, 2, 3, …, and so on components.• Criteria for model selection includes the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC):
kkLAIC 2)ˆ(log2
)log()ˆ(log2 nLBIC kk
Where 13 kk Is the number of independent parameters in the mixture.
• Alternatively, the distribution of the likelihood ratio test (LRT) can be estimated by bootstrapping and P-values obtained to contrast a model with k components against a model with k + 1 components.
k
iiiik Vyyf
1
,;;
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Simulation 1Consider theseDistribution
N(1,5)N(5,10)
Records
10,000 5,000…and simulate
)10,5(31)5,1(3
2)ˆ;( NNyf The Mixture becomes:
);(,;
j
iijiij yf
Vy Posterior Prob:
LikelihoodN(1,5) N(5,10)
-1 0.120 0.021 0 0.161 4 0.036 1 0.178 0.056 5 0.036 3 0.126 7 0.005 0.103
Weighted average (by mixing proportions)
62
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Simulation 2Consider theseDistribution
N(0,1)N(0,10)
Records
9,000 1,000…and simulate
)10,0(1.0)1,0(9.0)ˆ;( NNyf The Mixture becomes:
Microarray
Non-DE Genes DE Genes
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of DistributionsSimulation 2 )10,0(1.0)1,0(9.0)ˆ;( NNyf
2. Ask EMMIX to fit mixtures with up to 5 components and…
)805.10,010.0(097.0)993.0,006.0(903.0)ˆ;( NNyf
3. EMMIX model of best fit:
1. Simulate:
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of DistributionsSimulation 2 )10,0(1.0)1,0(9.0)ˆ;( NNyf
)805.10,010.0(097.0)993.0,006.0(903.0)ˆ;( NNyf3. EMMIX best fit:
1. Simulate:
Frequency Post Prob
Posterior Probabilities are “Decision Function” changing at 2.75
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of DistributionsLinking Posterior Probabilities with False Discovery Rate
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of DistributionsLinking Posterior Probabilities with False Discovery Rate
Not-DE DESelect the N most extreme genes, and FDR is the average posterior probability of not being in the cluster of extremes.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Simulation 2 )10,0(1.0)1,0(9.0)ˆ;( NNyf
)805.10,010.0(097.0)993.0,006.0(903.0)ˆ;( NNyf3. EMMIX best fit:
1. Simulate:
Post Prob
Select the N most extreme genes, and FDR is the average Post Prob of not being in the cluster of extremes.
FDR by N Genes
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Example“Diets”
(only REFERENCE components of the design)
88ii rg
iiHvLi
rgy
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Example“Diets”
(only REFERENCE components of the design)
88ii rg
iiHvLi
rgy
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Example“Diets”
(only REFERENCE components of the design)
88ii rg
iiHvLi
rgy
k
iiiik Vyyf
1
,;;
)32.2,41.2(366.0)42.10,30.2(590.0)46.67,87.0(044.0)ˆ;(
NNNyf
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Example
,)32.2,41.2(366.0)42.10,30.2(590.0)46.67,87.0(044.0)ˆ;( NNNyf
“Diets”(only REFERENCE components of the design)
A Quantitative Overview to Gene Expression Profiling in Animal Genetics
Armidale Animal Breeding Summer Course, UNE, Feb. 2006
Mixtures of Distributions
Example
,)32.2,41.2(366.0)42.10,30.2(590.0)46.67,87.0(044.0)ˆ;( NNNyf
“Diets”(only REFERENCE components of the design)
FDR by N Genes
In Reverter et al. ‘03 (JAS 81:1900), 27 genes were reported as having a PP > 0.95 of being in the extreme cluster.
Now, we can assess that these 27 genes include a FDR < 10%.