robust algorithms for inferring regulatory …
TRANSCRIPT
![Page 1: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/1.jpg)
ROBUST ALGORITHMS FOR INFERRING REGULATORY NETWORKSBASED ON GENE EXPRESSION MEASUREMENTS
AND BIOLOGICAL PRIOR INFORMATION
ir. Tim Van den Bulcke
Chairman: Prof. dr. ir. J. Berlamont
Promotors:Prof. dr. ir. Bart De MoorProf. dr. ir. Kathleen Marchal
Jury:Prof. dr. ir. J.A.K. SuykensProf. dr. L. De RaedtProf. dr. G. VerbekeProf. dr. ir. T. De Bie (University of Bristol (U.K.)Dr. T. Michoel (Universiteit Gent)
![Page 2: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/2.jpg)
2
Overview
• Overview
• Introduction
– The language of life
– Systems biology
• SynTReN: large scale application of simulated data to assess network inference algorithms
– SynTReN model• Network topology
– Results• Effect of network topology
• Effect of number of microarrays
• ProBic: model-based biclustering of gene expression data
– ProBic model• Probabilistic relational models
• EM algorithm
– Results• Simulated datasets
• E. coli compendium: query-driven biclustering
– Extensions
• Conclusion
Overview Introduction SynTReN ProBic Conclusion
![Page 3: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/3.jpg)
Introduction: the language of life
![Page 4: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/4.jpg)
4
The language of life
Termite mound
Translucent squid
Meller's Chameleon
Humpback whale
E. coli
Hippocampus
Coral snake
Baobab tree
Overview Introduction SynTReN ProBic Conclusion
![Page 5: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/5.jpg)
5
The language of life: building blocks
Overview Introduction SynTReN ProBic Conclusion
![Page 6: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/6.jpg)
6
The language of life: building blocks
Translucent squid
E. coli
Baobab tree
DNA / genesinformation carrier
Proteinworkhorses
RNAmessenger
Metabolitesignaling, activation
Overview Introduction SynTReN ProBic Conclusion
![Page 7: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/7.jpg)
7
Feedback information
The language of life: regulation
Flap movement
Black box recording
Signal conversion to hydraulic pressure
Signal transductionSteeringSteering action
Overview Introduction SynTReN ProBic Conclusion
![Page 8: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/8.jpg)
8
The language of life: regulation
gene
TF (protein)
mRNA
metaboliteprotein
Overview Introduction SynTReN ProBic Conclusion
![Page 9: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/9.jpg)
9
The language of life: regulation
gene
metabolite
D
mRNA
protein
A B C
Overview Introduction SynTReN ProBic Conclusion
![Page 10: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/10.jpg)
10
The language of life: regulation
E. coli
Gene regulatory network E.coli
Overview Introduction SynTReN ProBic Conclusion
![Page 11: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/11.jpg)
Introduction: systems biology
![Page 12: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/12.jpg)
12
Systems biology
• Systems biology
– Systematic study of complex interactions in biological systems using a holistic approach
Network
inference
algorithm
High-
throughput
experiments
Organism ‘omics data Interaction network
Overview Introduction SynTReN ProBic Conclusion
![Page 13: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/13.jpg)
13
High-throughput ‘omics data
Genomics
e.g. DNA sequencing
Proteomics
e.g. 2D gels
Metabolomics
e.g. mass spectrometry
Transcriptomics
e.g. DNA microarrays
Overview Introduction SynTReN ProBic Conclusion
![Page 14: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/14.jpg)
14
DNA microarrays
mRNA
extraction
cDNA or mRNA
labeling
A
B CD
III
IIIABCD…
I II III …
Overview Introduction SynTReN ProBic Conclusion
![Page 15: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/15.jpg)
Part I: SynTReN
Large scale application of simulated data
to assess network inference algorithms
![Page 16: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/16.jpg)
16
Introduction
Network
inference
algorithm
Microarray
experiments
Limited knowledge of true underlying network
How to characterize algorithm performance?
Organism Microarray data Gene regulatory networkSimulated organism
Reconstruction of gene regulatory networks
Overview Introduction SynTReN ProBic Conclusion
![Page 17: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/17.jpg)
17
Introduction
• Simulated data– (+) Complete insight in underlying model (white box model)– (+) Control over all parameters– (+) Fast, in silico experiments– (- ) Based on simplified models of biology
• SynTReN– Fast generation of large simulated gene expression datasets
under various settings– Offer additional, sometimes unexpected, insights in the
behavior of inference algorithms compared to biological data only
Overview Introduction SynTReN ProBic Conclusion
![Page 18: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/18.jpg)
18
Network topology
Define interactions
Add noise function
Set external conditions
Sample data
Microarray-like dataset
SamplingEvaluation
True network
Learned network
Network inference algorithm
Source network
SynTReN
Network topology
Erdös-Rényi Watts-Strogatz
Albert-Barabási Directed scale free
Random graph models
Subnetwork selection
Neighbor addition Cluster addition
Source network
SynTReN setup
Overview Introduction SynTReN ProBic Conclusion
![Page 19: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/19.jpg)
19
Network topology
Define interactions
Add noise function
Set external conditions
Sample data
Microarray-like dataset
SamplingEvaluation
True network
Learned network
Network inference algorithm
Source network
SynTReN
Define interactions
Michaelis-Menten and Hill enzyme kinetics
• Complex interactions
• Synergism and antagonism
• Cooperative binding
• Competitive binding
• Steady-state
SynTReN setup
Overview Introduction SynTReN ProBic Conclusion
![Page 20: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/20.jpg)
20
Network topology
Define interactions
Add noise function
Set external conditions
Sample data
Microarray-like dataset
SamplingEvaluation
True network
Learned network
Network inference algorithm
Source network
SynTReN
Add noise function
• Different noise types, caused by:
• Experimental setup
• Biological variation
• External stimuli
Adding different noise types
SynTReN setup
Overview Introduction SynTReN ProBic Conclusion
![Page 21: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/21.jpg)
21
Network topology
Define interactions
Add noise function
Set external conditions
Sample data
Microarray-like dataset
Evaluation
True network
Learned network
Network inference algorithm
Source network
SynTReN
Set external conditions
Sample data
Microarray-like datasets
Sampling
SynTReN setup
Sampling data
Overview Introduction SynTReN ProBic Conclusion
![Page 22: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/22.jpg)
22
Network topology
Define interactions
Add noise function
Set external conditions
Sample data
Microarray-like dataset
True network
Learned network
Network inference algorithm
Source network
SynTReN
Sampling
True network
Learned network
Network inference algorithm
Evaluation
Characterization of network inference algorithms
ARACNE Genomica SAMBA
SynTReN setup
Overview Introduction SynTReN ProBic Conclusion
![Page 23: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/23.jpg)
SynTReN: results
![Page 24: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/24.jpg)
24
SynTReN results
• Effect of network topology
• Effect of noise and amount of data
Overview Introduction SynTReN ProBic Conclusion
![Page 25: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/25.jpg)
25
Effect of network topology
• Network topology has a strong impact on algorithm performance
• Genomica and ARACNE perform significantly better on biological (sub)networks
Overview Introduction SynTReN ProBic Conclusion
![Page 26: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/26.jpg)
26
Effect of noise and amount of data
• Algorithms show qualitatively different response to increasing data
• Plateau is reached for some algorithms
• Varying dependency on noise
Overview Introduction SynTReN ProBic Conclusion
![Page 27: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/27.jpg)
27
SynTReN conclusions
• Application simulated data offers interesting insights in characteristics of network inference algorithms
• Underlying network topology of simulated data is an important factor w.r.t. the quality of the inferred network
• Biological (sub)networks generally lead to better inference
Overview Introduction SynTReN ProBic Conclusion
![Page 28: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/28.jpg)
Part II: ProBic
Model-based biclustering of gene expression data
![Page 29: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/29.jpg)
29
ProBic: introduction
Network
inference
algorithm
Microarray
experiments
Organism Microarray data Gene regulatory network
Reconstruction of gene regulatory networks
Model-based biclustering
Overview Introduction SynTReN ProBic Conclusion
![Page 30: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/30.jpg)
30
ProBic: introduction
Gene Experiment 1
Experiment 2
Experiment 3
Experiment 4
Experiment 5
Gene T
Gene U
Gene V
Gene W
Gene X
Gene Y
Gene Z
A
U W Z
1-4
VT
3-5
B
Y
Query-driven biclustering
Global biclustering
Biclustering
Overview Introduction SynTReN ProBic Conclusion
![Page 31: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/31.jpg)
31
ProBic introduction
ProBic goals:
• Unified probabilistic model
• Combined query-driven and global biclustering
• Model driven:
• Multiple overlapping biclusters
• Incorporate diametrical biclustering
• Computational efficiency
• No data discretization required
• Extensible towards heterogeneous data
Overview Introduction SynTReN ProBic Conclusion
![Page 32: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/32.jpg)
32
ProBic model
• Designed for:– Computational efficiency
– Extensibility
• Three classes: – Gene
– Array
– Expression
Expression
ArrayGene
??g2
??g1
B2B1ID
??a2
??a1
B2B1ID
1.6a1g2
0.5a2g2
-2.4a1g1
levela.IDg.ID
Dataset instance
Based on Probabilistic relational model framework• relational extension to Bayesian networks:
Bayesian networks ~ a single flat tableProbabilistic relational models ~ relational data structure
Overview Introduction SynTReN ProBic Conclusion
![Page 33: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/33.jpg)
33
ProBic model
• Parameters:– Set of Normal distributions
• Hidden variables:– Gene-bicluster assignments
– Array-bicluster assignments
• Priors:– Query-driven biclustering
– Expert knowledge
Overview Introduction SynTReN ProBic Conclusion
PRM model
![Page 34: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/34.jpg)
34
ProBic model: EM algorithm
• Only approximative algorithms are tractable
• Hard assignment Expectation-Maximization algorithm
– Natural decomposition of the model in the EM steps
– Efficient
– Extensible
– Good convergence properties
Overview Introduction SynTReN ProBic Conclusion
![Page 35: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/35.jpg)
35
ProBic model: EM algorithm
Expression
Array
ID
Gene
B B… …
P(e.level | e.gene.B, e.array.B, e.array.ID)=
Normal( μa,b, σa,b )
B2B2 B1
B1
InitializationWhile not converged:
•Maximization step• Normal distribution parameters
•Expectation step• gene-bicluster assignments• array-bicluster assignments
levellevel
P(e.level | e.gene.B, e.array.B, e.array.ID)=
Normal( μa,b, σa,b )
Overview Introduction SynTReN ProBic Conclusion
![Page 36: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/36.jpg)
ProBic: results
![Page 37: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/37.jpg)
37
ProBic results
• Simulated datasets:
– Noise and missing value robustness
– Comparison to state of the art
• Query-driven biclustering (E. coli compendium):
– Single gene queries
– Outlier removal in multi-gene queries
Overview Introduction SynTReN ProBic Conclusion
![Page 38: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/38.jpg)
38
Noise and missing value robustness
• Simulated data: 500x200
• 3 biclusters(50x50)
• Varying noise and missing values Noise level:
Noise level:Noise level:
Noise level:
… precision recall
gen
esar
rays
Overview Introduction SynTReN ProBic Conclusion
![Page 39: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/39.jpg)
39
• Simulated datasets defined by Prelic et al. (2006)– Independent datasets
– Not biased towards particular algorithm
• Query-driven biclustering algorithms:– Gene Recommender (GR) [Owen et al., 2003]
– Iterative Signature Algorithm (ISA) [Ihmels et al., 2002]
– Query-driven biclustering (QDB) [Dhollander et al., 2008]
– ProBic (PB) [Van den Bulcke et al., 2009]
Comparison with state of the art
Overview Introduction SynTReN ProBic Conclusion
![Page 40: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/40.jpg)
40
Comparison to state of the art
Scenario 1: constant values
Noise Overlap
Rec
ove
ryR
elev
ance
Overview Introduction SynTReN ProBic Conclusion
![Page 41: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/41.jpg)
41
Comparison to state of the art
Scenario 2: constant columns
Noise Overlap
Rec
ove
ryR
elev
ance
Overview Introduction SynTReN ProBic Conclusion
![Page 42: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/42.jpg)
42
Single gene query-driven biclustering
• Select single target from a known regulator
• Use this gene as ‘query’ gene
• E. coli compendium [Lemmens et al., 2008]
Query: uvrB
Overview Introduction SynTReN ProBic Conclusion
![Page 43: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/43.jpg)
43
Experimental setup:• Examine effect of outlier
genes in set of query genes
Query genes:• Two operon genes• One or more random genes
Example:• E. coli compendium,
cyoABCDE operon• Select two genes (cyoA and
cyoB) and one or more random genes.
Outlier removal in QD biclustering
Overview Introduction SynTReN ProBic Conclusion
![Page 44: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/44.jpg)
Conclusion
![Page 45: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/45.jpg)
45
Summary
• SynTReN:– Fast, large scale simulator of regulatory networks– Simulated data reveals operational characteristics of network
inference algorithms unlikely to be discovered with biological data only
– Network topology has an important effect on inference quality
• ProBic:– Combined global and query-driven biclustering model– Simultaneous biclustering of multiple overlapping biclusters– Extensibility towards module network inference– Robust w.r.t. noise and missing values– Query-driven biclustering with:
• single genes • multi-gene queries containing outlier genes
Overview Introduction SynTReN ProBic Conclusion
![Page 46: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/46.jpg)
46
Outlook & future work
• SynTReN:
– Extension towards heterogeneous networks (metabolites, proteins, DNA, RNA)
• ProBic:
– Development of a GUI and large scale application to biological data
– Extend ProBic model towards regulatory module identification, including:• Condition annotation
• Motif and regulator annotation
Overview Introduction SynTReN ProBic Conclusion
![Page 47: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/47.jpg)
47
Acknowledgements
• Promotors
• Prof. dr. ir. Bart De Moor
• Prof. dr. ir. Kathleen Marchal
• ESAT-SCDBioinformatics group
• CMPG BioI
• ir. Hui Zhao
• ISLab
• dr. Koen Van Leemput
![Page 48: ROBUST ALGORITHMS FOR INFERRING REGULATORY …](https://reader030.vdocuments.us/reader030/viewer/2022040317/6247fd003b414d1933247442/html5/thumbnails/48.jpg)
48
Publications
Published• T. Van den Bulcke, K. Van Leemput, B. Naudts, P. van Remortel, H. Ma, A.
Verschoren, B. De Moor and K. Marchal. SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 2006, 7:43.
• T. Van den Bulcke, K. Lemmens, Y. Van de Peer, and K. Marchal. Inferring transcriptional networks by mining ’omics’ data. Current Bioinformatics 2006, 1:3.
• T. Michoel, S. Maere, E. Bonnet, A. Joshi, T. Van den Bulcke, K. Van Leemput, P. van Remortel, M. Kuiper, K. Marchal and Y. Van de Peer. Validating module network learning algorithms using simulated data. BMC Bioinformatics 2007, 8(Suppl 2):S5.
• K. Van Leemput, T. Van den Bulcke, T. Dhollander, B. De Moor, K. Marchal, P. van Remortel. Exploring the operational characteristics of inference algorithms for transcriptional networks by means of synthetic data. Artificial Life 2008, 14:1, 49-63.
Submitted• H. Sung, K. Lemmens, T. Van den Bulcke, K. Engelen, B. De Moor, K. Marchal.
ViTraM: Visualization of Transcriptional Modules, Bioinformatics 2009.In preparation• T. Van den Bulcke, H. Zhao, K. Engelen, B. De Moor and K. Marchal. ProBic: Global
and query-driven biclustering of gene expression data using Probabilistic Relational Models.