![Page 1: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/1.jpg)
1
Gene Expression Analysis Using Bayesian Networks
Éric Paquet
LBIT Université de Montréal
![Page 2: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/2.jpg)
2
Biological basis
DNA(Storage of Genetic
Information)
mRNA(Storage & Transport
of Genetic Information)
Proteins(Expression of
Genetic Information)
RNA Polymerase(Copy DNA in RNA)
Ribosome(Translate Genetic
Information in Proteins)
*-PDB file 1L3A, Transcriptional Regulator Pbf-2 2
![Page 3: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/3.jpg)
3
Biological basis
3
How do proteins get regulated? E. coli operon lactose example :
In normal time, E. coli uses glucose to get energy, but how does it react if there is no more glucose but only lactose?
![Page 4: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/4.jpg)
4
Biological basis
4
......
RNA Polymerase
Polymerase action is blocked because of a DNA lockGene Lac I associated protein
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
Glucose Lactose
X
E. coli environment
![Page 5: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/5.jpg)
5
......
RNA Polymerase
Glucose Lactose
X
E. coli environment
Biological basis
X
Lactose
5
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
Lactose
Lactose recruits gene lacI associated protein… unlockingthe DNA that is then accessible to the polymerase
![Page 6: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/6.jpg)
6
Biological basis
6
= inhibit
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
![Page 7: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/7.jpg)
7
......
RNA Polymerase
Glucose Lactose
E.coli environment
Biological basis
X
7
In absence of glucose, a polymerase magnet binds to the DNA to accelerate the products of information that help lactose decomposition
CAP
c-AMP
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
Lactose
![Page 8: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/8.jpg)
8
Biological basis
8
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
= inhibit
= activate
Research goal:Infer these links
![Page 9: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/9.jpg)
9
Why?
Get insights about cellular processesHelp understand diseasesFind drug targets
9
![Page 10: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/10.jpg)
10
How?
Using gene expression data and tools for learning Bayesian networks
*-Spellman et al.(1998) Mol Biol Cell 9:3273-97
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
+
*
10
Experiments
[mR
NA
] Tools for Learning Bayesian networks
![Page 11: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/11.jpg)
11
A real value is coming from one spot and tells if the concentration of a specific mRNA is higher(+) or lower(-) than
the normal value
What is gene expression data?
Data showing the concentration of a specific mRNA at a given time of the cell life.
*
*-Spellman et al.(1998) Mol Biol Cell 9:3273-97
Experiments
[mR
NA
]Every columns are the result of one image
![Page 12: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/12.jpg)
12
What is Bayesian networks?
Graphic representation of a joint distribution over a set of random variables.
A B
C D
E
P(A,B,C,D,E) = P(A)*P(B) *P(C|A)*P(D|A,B) *P(E|D)
Nodes represent gene expression while edges encode the interactions (cf. inhibition, activation)
![Page 13: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/13.jpg)
13
Bayesian networks little problem
A Bayesian network should be a DAG (Direct Acyclic Graph), but there are a lot of example of regulatory networks having directed cycles.
*
*-Husmeier D.,Bioinformatics,Vol. 19 no. 17 2003, pages 2271–2282
Histeric oscillator
Switch
Transcription factor dimer
![Page 14: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/14.jpg)
14
How can we deal with that?
Using DBN (Dynamic Bayesian Networks*) and sequential gene expression data
A
B
A1
B1
A2
B2
We unfold the network in time
*-Friedman, Murphy, Russell,Learning the Structure of Dynamic Probabilitic Networks
DBN = BN with constraints on parents and children nodes
t t+1
![Page 15: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/15.jpg)
15
What are we searching for?
A Bayesian network that is most probable given the data D (gene expression)
We found this BN like that :BN* = argmaxBN{P(BN|D)}
)()()|()|(
DPBNPBNDPDBNP
Prior on network structureMarginal likelihood
Data probability
Where:
Naïve approach to the problem : try all possible dags and keep the best one!
![Page 16: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/16.jpg)
16
It is impossible to try all possible DAGs because
The number of dags increases super-exponentially with the number of nodes
n = 3 → 25 dagsn = 4 → 543 dags n = 5 → 29281 dagsn = 6 → 3781503 dags n = 7 → 1138779265 dagsn = 8 → 783702329343 dags…
We are interested in problem having around 60 nodes ….
![Page 17: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/17.jpg)
17
Learning Bayesian Networks from data?
Choosing search space method and a conditional distribution representation
•Networks space search methods•Greedy hill-climbing•Beam-search•Stochastic hill-climbing•Simulated annealing•MCMC simulation
•Conditional distribution representation•Linear Gaussian•Multinomial, binomial
Basically add, remove and reverse edges
A
B
CP(a) = ?P(b) = ?P(c|a,b) = ?
![Page 18: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/18.jpg)
18
Learning Bayesian Networks from data?
Choosing search space method and a conditional distribution representation
•Networks space search methods•Greedy hill-climbing•Beam-search•Stochastic hill-climbing•Simulated annealing•MCMC simulation
•Conditional distribution representation•Linear Gaussian•Multinomial, binomial
A
B
CP(a) = ?P(b) = ?P(c|a,b) = ?Basically add, remove and reverse edges
![Page 19: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/19.jpg)
19
We use three types of gene expression level?
Sort
-1.06 -0.12 0.18 0.21 1.16 1.19
Split data in 3 equal buckets
-1.06 -0.12 0.18 0.21 1.16 1.19
0 1 2
0 0 2 2 1 1 Discretized data
![Page 20: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/20.jpg)
20
Return on:
)()()|()|(
DPBNPBNDPDBNP
Prior on network structureMarginal likelihood
Data probability
![Page 21: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/21.jpg)
21
Insight on each terms
P(BN) → prior on networkIn our research, we always use a prior equals to 1We could incorporate knowledge using it
Eg. : we know the presence of an edge. If the edge is in the BN, P(BN) = 1 else P(BN) = 0
Efforts are made to reduce the search space by using knowledge eg. limit the number of parents or children
![Page 22: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/22.jpg)
22
Insight on each terms
P(D|BN) → marginal likelihoodEasy to calculate using Multinomial distribution with Dirichlet prior *
ri
k ijk
ijkijkn
i
qi
j ijij
ij
asa
MNNbndP
11 1 )()(
)()()|(
*-Heckerman,A Tutorial on Learning With Bayesian Networks and Neapolitan,Learning Bayesian Networks
![Page 23: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/23.jpg)
23
A
C B
MCMC (Markov Chain Monte Carlo) simulation
Markov Chain part:Zoom on a node of the chain
A
C B
A
C B
A
C B
A
C B
A
C B
A
C B
1/5
1/5
1/51/5
1/5
0
P(BNnew)
![Page 24: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/24.jpg)
24
MCMC (Markov Chain Monte Carlo) simulation
Monte Carlo part:Choose next BN with probability P(BNnew)Accept the new BN with the following Metropolis–Hastings acceptance criterion :
gone! is P(D))(*)()|()()|(,1min
)(*)()()|()()()|(,1min
)(*)|()|(,1min
BNnewPBNoldPBNoldDPBNnewPBNnewDP
BNnewPDPBNoldPBNoldDPDPBNnewPBNnewDP
BNnewPDBNoldPDBNnewPMHP
![Page 25: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/25.jpg)
25
Monte Carlo part example :1. Choose a random path. Each path having a P(BNnew) of 1/5
A
C B
A
C B
A
C B
A
C B
A
C B
A
C B
A
C B
1/5
1/5
1/51/5
1/5
0
P(BNnew)
1. Choose a random path. Each path having a P(BNnew) of 1/52. Choose another random number. If it is smaller than the
Metropolis-Hasting criterion, accept BNnew else return to BNold
![Page 26: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/26.jpg)
26
MCMC (Markov Chain Monte Carlo) simulation recap:Choose a starting BN at randomBurning phase (generate 5*N BN from MCMC without storing them)Storing phase (get 100*N BN structure from MCMC)
log(
P(D
| B
N)P
(BN
))
Iteration
= Burning phase= Storing phase
![Page 27: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/27.jpg)
27
Why 100*N BN and not only 1:
Cause we don’t have enough data and there are a lot of high scoring networksInstead, we associate confidence to edge. Eg. : how many time in the sample can we find edge going from A to B?We could fix a threshold on confidence and retrieve a global network construct with edges having confidence over the threshold
![Page 28: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/28.jpg)
28
What we are working on:
Mixing both sequential and non-sequential data to retrieve interesting relation between genesHow?
Using DBN and MCMC for sequential data + BN and MCMC for non-sequential
100*N networks from DBN 100*N networks from BN
Informationtuner
Learn network
![Page 29: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/29.jpg)
29
How to test the approach:
Problem : There is no way to test it on real data cause there is no completely known networkSolution : Work on realistic simulation where we know the network structureExample :
*-Hartemink A.” Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks”
0 1 12
2 4 13
3 5 6
7 8 9
10
11
*
Simulate
![Page 30: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/30.jpg)
30
How to test the approach:
*-Hartemink A.” Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks”
0 1 12
2 4 13
3 5 6
7 8 9
10
11
*
Simulate
Sequential data Non-Sequential data
Infotuner DBN
MCMC
BNMCMC
0 1 122 4 133 5 6
7 8 91011
Compare using ROC curves
![Page 31: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/31.jpg)
31
Test description:
Generate 60 sequential dataGenerate 120 non-sequential data (~reality proportion)Run DBN MCMC on sequential data keep 100*N sample netRun BN MCMC on non-sequential data keep 100*N sample netTest performance using weight on sample
0 BN 1 DBN.05 BN 0.95 DBN…0.95 BN .05 DBN1 BN 0 DBN
The metric used is the area under ROC curve. Perfect learner gets 1.0 , random gets 0.5 and the worst one gets 0.
![Page 32: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/32.jpg)
32
Results:
1 DBN10
Are
a un
der R
OC
cur
ve
0 BN
![Page 33: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/33.jpg)
33
Perspective:
Working on more sophisticated ways to mix sequential and non-sequential dataWorking on real cases:
Yeast cell-cycleArabidopsis Thaliana circadian rhythm
Real data also means missing valuesEvaluate missing values solution (EM, KNNImpute)
![Page 35: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/35.jpg)
35
Why are there missing datas?
Low correlationExperimental problems
![Page 36: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/36.jpg)
36
ROC Curve
Receiver Operating Characteristic curve
*-http://gim.unmc.edu/dxtests/roc2.htm
*
![Page 37: Gene Expression Analysis Using Bayesian Networks](https://reader035.vdocuments.us/reader035/viewer/2022062500/568159fd550346895dc7489b/html5/thumbnails/37.jpg)
37
MCMC simulation and number of sampled networks
ROC curve area in function of the number of sample networks from MCMC simulation for N=12
0.86
0.865
0.87
0.875
0.88
0.885
0.89
0.895
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
3250
3500
3750
4000
4250
4500
4750
5000
# of samples from MCMC
RO
C a
rea