1 identification of overlapping biclusters using probabilistic relational models tim van den bulcke...
Post on 20-Jan-2016
220 views
TRANSCRIPT
![Page 1: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/1.jpg)
1
Identification of overlapping biclusters using Probabilistic Relational Models
Tim Van den Bulcke
Hui Zhao
Kristof Engelen
Bart De Moor
Kathleen Marchal
PMCB Workshop
Thursday, 26 July, 2007.
![Page 2: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/2.jpg)
2
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 3: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/3.jpg)
3
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 4: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/4.jpg)
4
Biclustering and biology
• Definition in the context of gene expression data: A bicluster is a subset of genes which show a similar expression profile under a subset of conditions.
genes
conditions
![Page 5: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/5.jpg)
5
Biclustering and biology
Why bi-clustering?*
• Only a small set of the genes participates in a cellular process.
• A cellular process is active only in a subset of the conditions.
• A single gene may participate in multiple pathways that may or may not be coactive under all conditions.
* From: Madeira et al. (2004) Biclustering Algorithms for Biological Data Analysis: A Survey
![Page 6: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/6.jpg)
6
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 7: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/7.jpg)
7
Probabilistic Relational Models (PRMs)
Patient
Treatment
Virus strain Contact
Image: free interpretation from Segal et al. Rich probabilistic models
![Page 8: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/8.jpg)
8
Probabilistic Relational Models (PRMs)
• Traditional approaches “flatten” relational data– Causes bias
– Centered around one view of the data
– Loose relational structure
• PRM models– Extension of Bayesian networks
– Combine advantages of probabilistic reasoning with relational logic
Patient
flatten
Contact
![Page 9: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/9.jpg)
9
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 10: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/10.jpg)
10
ProBic biclustering model: notation
• g: gene
• c: condition
• e: expression
• g.Bk: gene-bicluster assignment for gene g to bicluster k
• c.Bk: condition-bicluster assignment for condition c to bicluster k
• e.Level: expression level value
• G, C, E (capital letters): set of all genes, conditions, expression levels resp.
• μg.B, c.B, c, σg.B, c.B, c: Normal distribution parameters for condition c, with gene-bicluster and condition-bicluster assignments g.B and c.B
![Page 11: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/11.jpg)
11
ProBic biclustering model
• Dataset instance
GeneGene
ExpressionExpression
ConditionCondition
ID B1 B2
g1 ? (0 or 1) ? (0 or 1)
g2 ? (0 or 1) ? (0 or 1)
ID B1 B2
c1 ? (0 or 1) ? (0 or 1)
c2 ? (0 or 1) ? (0 or 1)
g.ID c.ID level
g1 c1 -2.4
g1 c2 (missing value)
g2 c1 1.6
g2 c2 0.5
![Page 12: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/12.jpg)
12
ProBic biclustering model
• Relational schema and PRM model
Notation:
• g: gene
• c: condition
• e: expression
• g.Bk: gene-bicluster assignment for gene g to bicluster k (0 or 1, unknown)
• c.Bk: condition-bicluster assignment for condition c to bicluster k: (0 or 1, unknown)
• e.Level: expression level value (continuous, known)
GeneGene
ExpressionExpression
ConditionCondition
B1 B2
level
B1 B2
P(e.level | g.B1,g.B2,c.B1,c.B2,c.ID)=
Normal( μg.B,c.B,c.ID, σg.B,c.B,c.ID )
ID
P(e.level | g.B1,g.B2,c.B1,c.B2,c.ID)=
Normal( μg.B,c.B,c.ID, σg.B,c.B,c.ID )
1
2
3
![Page 13: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/13.jpg)
13
ProBic biclustering model
GeneGene
ExpressionExpression
ConditionCondition
? (0 or 1)? (0 or 1)g2
? (0 or 1)? (0 or 1)g1
B2B1ID
? (0 or 1)? (0 or 1)c2
? (0 or 1)? (0 or 1)c1
B2B1ID
1.6c1g2
(missing value)c2g1
0.5c2g2
-2.4c1g1
levelc.IDg.ID
g1.B1
g1.B2 level1,1
c1.B1 c1.B2
g2.B1
g2.B2 level2,
2
c2.B1 c2.B2
level2,1
c1.ID c2.ID
PRM modelDatabase instance
ground Bayesian network
GeneGene
ExpressionExpression
ConditionCondition
B1 B2B1 B2
P(e.level | g.B1,g.B2,c.B1,c.B2, c.ID)=
Normal( μg.B,c.B, c.ID, σg.B,c.B,c.ID )
ID
level
![Page 14: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/14.jpg)
14
ProBic biclustering model
• ProBic posterior ( ~ likelihood x prior ):
Expression level
conditional probabilities
Expression level prior
(μ, σ)’s
Prior condition to bicluster assignmentsPrior gene to bicluster assignment
![Page 15: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/15.jpg)
15
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results
• Conclusion
![Page 16: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/16.jpg)
16
Algorithm: choices
• Different approaches possible
• Only approximative algorithms are tractable:– MCMC methods (e.g. Gibbs sampling)
– Expectation-Maximization (soft, hard assignment)
– Variational approaches
– simulated annealing, genetic algorithms, …
• We chose a hard assignment Expectation-Maximization algorithm (E.-M.)– Natural decomposition of the model in E.-M. steps
– Efficient
– Good convergence properties for this model
– Extensible
![Page 17: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/17.jpg)
17
Algorithm: Expectation-Maximization
• Maximization step:– Maximize posterior w.r.t. μ, σ values (model parameters),
given the current gene-bicluster and condition-bicluster assignments (=the hidden variables)
• Expectation step:– Maximize posterior w.r.t. gene-bicluster and condition-
bicluster assignments, given the current model parameters
– Two-step approach:
• Step 1: max. posterior w.r.t. C.B, given G.B and μ, σ values
• Step 2: max. posterior w.r.t. G.B, given C.B and μ, σ values
![Page 18: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/18.jpg)
18
Algorithm: Expectation-Maximization
• Expectation step 1: condition-bicluster assignment– Independent per condition
– Evaluate function for every condition and for every bicluster assignment e.g. 200 conditions, 30 biclusters: 200 * 230 = 200 billion ~ a lot
– But can be performed very efficiently:
• Partial solutions can be reused among different bicluster assignments
• Only evaluate potential good solutions: use Apriori-like approach.
• Avoid background evaluations
1
2
3
![Page 19: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/19.jpg)
19
Algorithm: initialization
• Initialization options:– Multiple random initializations
– Initialize biclusters with (nearly) complete dataset
– Initialize all biclusters simultaneously
– Init/converge one bicluster at a time, then add next (still allowing first bicluster to change)
• Best results:– One initialization: initialize biclusters with (nearly)
complete dataset
– Iteratively add one bicluster and run E.-M.
![Page 20: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/20.jpg)
20
Algorithm: example
![Page 21: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/21.jpg)
21
Algorithm: example
![Page 22: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/22.jpg)
22
Algorithm properties
• Speed:– 500 genes, 200 conditions, 2 biclusters: 2 min.
– Scaling:
• ~ #genes . #conditions . 2#biclusters (worse case)
• ~ #genes . #conditions . (#biclusters)p (in practice), p=1..3
![Page 23: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/23.jpg)
23
Overview
• Biclustering and biology
• Probabilistic Relational Models
• ProBic biclustering model
• Algorithm
• Results– Noise sensitivity
– Bicluster shape
– Overlap
• Conclusion
![Page 24: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/24.jpg)
24
Results: noise sensitivity
• Setup: – Simulated dataset: 500 genes x 200 conditions
– Background distribution: Normal(0,1)
– Bicluster distributions: Normal( rnd(N(0,1)), σ ), varying sigma
– Shapes: three 50x50 biclusters
ordered randomized
![Page 25: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/25.jpg)
25
Results: noise sensitivity
A
B
Precision (genes) Recall (genes)
Precision (conditions) Recall (conditions)
A B A B
A
B
A B
σ σ
σ σ
…
…Precision = TP / (TP+FP) Recall = TP / (TP+FN)
![Page 26: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/26.jpg)
26
Results: bicluster shape independence
• Setup:– Dataset: 500 genes x 200 conditions
– Background distribution: N(0,1)
– Bicluster distributions: N( rnd(N(0,1)), 0.2 )
– Shapes: 80x10, 10x80, 20x20
![Page 27: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/27.jpg)
27
Results: bicluster shape independence
![Page 28: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/28.jpg)
28
Results: 10 biclusters
![Page 29: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/29.jpg)
29
Overlap examples
• Two biclusters (50 genes, 50 conditions)
• Overlap:25 genes, 25 conditions
• Two biclusters (10 genes, 80 conditions)
• Overlap: 2 genes, 40 conditions
![Page 30: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/30.jpg)
30
Near future
• Automated definition of algorithm parameter settings
• Application biological datasets– Dataset normalization
• Extend model with different overlap models
• Model extension from biclusters to regulatory modulesinclude motif + ChIP-chip data
PromoterPromoter
ExpressionExpression
ArrayArray
S1 S2 S3 S4
R1 R2 R3
M1 M2
level
M1P1 P2 P3
Gene
M2
TTCAATACAGG
R1
R2
…
![Page 31: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/31.jpg)
31
Conclusion
• Noise robustness
• Naturally deals with missing values
• Independent of bicluster shape
• Simultaneous identification of multiple overlapping biclusters
• Can be used query-driven
• Extensible
![Page 32: 1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal](https://reader035.vdocuments.us/reader035/viewer/2022070415/56649d255503460f949fc526/html5/thumbnails/32.jpg)
32
Acknowledgements
KULeuven:• whole BioI group, ESAT-SCD
– Hui Zhao
– Thomas Dhollander
• whole CMPG group(Centre of Microbial and Plant Genetics)
– Kristof Engelen
– Kathleen Marchal
UGent:• whole Bioinformatics &
Evolutionary Genomics group
– Tom Michoel BIO I..