1
Gene Expression Analysis Using Bayesian Networks
Éric Paquet
LBIT Université de Montréal
2
Biological basis
DNA(Storage of Genetic
Information)
mRNA(Storage & Transport
of Genetic Information)
Proteins(Expression of
Genetic Information)
RNA Polymerase(Copy DNA in RNA)
Ribosome(Translate Genetic
Information in Proteins)
*-PDB file 1L3A, Transcriptional Regulator Pbf-2 2
3
Biological basis
3
How do proteins get regulated? E. coli operon lactose example :
In normal time, E. coli uses glucose to get energy, but how does it react if there is no more glucose but only lactose?
4
Biological basis
4
......
RNA Polymerase
Polymerase action is blocked because of a DNA lockGene Lac I associated protein
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
Glucose Lactose
X
E. coli environment
5
......
RNA Polymerase
Glucose Lactose
X
E. coli environment
Biological basis
X
Lactose
5
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
Lactose
Lactose recruits gene lacI associated protein… unlockingthe DNA that is then accessible to the polymerase
6
Biological basis
6
= inhibit
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
7
......
RNA Polymerase
Glucose Lactose
E.coli environment
Biological basis
X
7
In absence of glucose, a polymerase magnet binds to the DNA to accelerate the products of information that help lactose decomposition
CAP
c-AMP
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
Lactose
8
Biological basis
8
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
= inhibit
= activate
Research goal:Infer these links
9
Why?
Get insights about cellular processesHelp understand diseasesFind drug targets
9
10
How?
Using gene expression data and tools for learning Bayesian networks
*-Spellman et al.(1998) Mol Biol Cell 9:3273-97
Lactose decomposor(β-galactosidase)
Lactose getter(permease)
+
*
10
Experiments
[mR
NA
] Tools for Learning Bayesian networks
11
A real value is coming from one spot and tells if the concentration of a specific mRNA is higher(+) or lower(-) than
the normal value
What is gene expression data?
Data showing the concentration of a specific mRNA at a given time of the cell life.
*
*-Spellman et al.(1998) Mol Biol Cell 9:3273-97
Experiments
[mR
NA
]Every columns are the result of one image
12
What is Bayesian networks?
Graphic representation of a joint distribution over a set of random variables.
A B
C D
E
P(A,B,C,D,E) = P(A)*P(B) *P(C|A)*P(D|A,B) *P(E|D)
Nodes represent gene expression while edges encode the interactions (cf. inhibition, activation)
13
Bayesian networks little problem
A Bayesian network should be a DAG (Direct Acyclic Graph), but there are a lot of example of regulatory networks having directed cycles.
*
*-Husmeier D.,Bioinformatics,Vol. 19 no. 17 2003, pages 2271–2282
Histeric oscillator
Switch
Transcription factor dimer
14
How can we deal with that?
Using DBN (Dynamic Bayesian Networks*) and sequential gene expression data
A
B
A1
B1
A2
B2
We unfold the network in time
*-Friedman, Murphy, Russell,Learning the Structure of Dynamic Probabilitic Networks
DBN = BN with constraints on parents and children nodes
t t+1
15
What are we searching for?
A Bayesian network that is most probable given the data D (gene expression)
We found this BN like that :BN* = argmaxBN{P(BN|D)}
)()()|()|(
DPBNPBNDPDBNP
Prior on network structureMarginal likelihood
Data probability
Where:
Naïve approach to the problem : try all possible dags and keep the best one!
16
It is impossible to try all possible DAGs because
The number of dags increases super-exponentially with the number of nodes
n = 3 → 25 dagsn = 4 → 543 dags n = 5 → 29281 dagsn = 6 → 3781503 dags n = 7 → 1138779265 dagsn = 8 → 783702329343 dags…
We are interested in problem having around 60 nodes ….
17
Learning Bayesian Networks from data?
Choosing search space method and a conditional distribution representation
•Networks space search methods•Greedy hill-climbing•Beam-search•Stochastic hill-climbing•Simulated annealing•MCMC simulation
•Conditional distribution representation•Linear Gaussian•Multinomial, binomial
Basically add, remove and reverse edges
A
B
CP(a) = ?P(b) = ?P(c|a,b) = ?
18
Learning Bayesian Networks from data?
Choosing search space method and a conditional distribution representation
•Networks space search methods•Greedy hill-climbing•Beam-search•Stochastic hill-climbing•Simulated annealing•MCMC simulation
•Conditional distribution representation•Linear Gaussian•Multinomial, binomial
A
B
CP(a) = ?P(b) = ?P(c|a,b) = ?Basically add, remove and reverse edges
19
We use three types of gene expression level?
Sort
-1.06 -0.12 0.18 0.21 1.16 1.19
Split data in 3 equal buckets
-1.06 -0.12 0.18 0.21 1.16 1.19
0 1 2
0 0 2 2 1 1 Discretized data
20
Return on:
)()()|()|(
DPBNPBNDPDBNP
Prior on network structureMarginal likelihood
Data probability
21
Insight on each terms
P(BN) → prior on networkIn our research, we always use a prior equals to 1We could incorporate knowledge using it
Eg. : we know the presence of an edge. If the edge is in the BN, P(BN) = 1 else P(BN) = 0
Efforts are made to reduce the search space by using knowledge eg. limit the number of parents or children
22
Insight on each terms
P(D|BN) → marginal likelihoodEasy to calculate using Multinomial distribution with Dirichlet prior *
ri
k ijk
ijkijkn
i
qi
j ijij
ij
asa
MNNbndP
11 1 )()(
)()()|(
*-Heckerman,A Tutorial on Learning With Bayesian Networks and Neapolitan,Learning Bayesian Networks
23
A
C B
MCMC (Markov Chain Monte Carlo) simulation
Markov Chain part:Zoom on a node of the chain
A
C B
A
C B
A
C B
A
C B
A
C B
A
C B
1/5
1/5
1/51/5
1/5
0
P(BNnew)
24
MCMC (Markov Chain Monte Carlo) simulation
Monte Carlo part:Choose next BN with probability P(BNnew)Accept the new BN with the following Metropolis–Hastings acceptance criterion :
gone! is P(D))(*)()|()()|(,1min
)(*)()()|()()()|(,1min
)(*)|()|(,1min
BNnewPBNoldPBNoldDPBNnewPBNnewDP
BNnewPDPBNoldPBNoldDPDPBNnewPBNnewDP
BNnewPDBNoldPDBNnewPMHP
25
Monte Carlo part example :1. Choose a random path. Each path having a P(BNnew) of 1/5
A
C B
A
C B
A
C B
A
C B
A
C B
A
C B
A
C B
1/5
1/5
1/51/5
1/5
0
P(BNnew)
1. Choose a random path. Each path having a P(BNnew) of 1/52. Choose another random number. If it is smaller than the
Metropolis-Hasting criterion, accept BNnew else return to BNold
26
MCMC (Markov Chain Monte Carlo) simulation recap:Choose a starting BN at randomBurning phase (generate 5*N BN from MCMC without storing them)Storing phase (get 100*N BN structure from MCMC)
log(
P(D
| B
N)P
(BN
))
Iteration
= Burning phase= Storing phase
27
Why 100*N BN and not only 1:
Cause we don’t have enough data and there are a lot of high scoring networksInstead, we associate confidence to edge. Eg. : how many time in the sample can we find edge going from A to B?We could fix a threshold on confidence and retrieve a global network construct with edges having confidence over the threshold
28
What we are working on:
Mixing both sequential and non-sequential data to retrieve interesting relation between genesHow?
Using DBN and MCMC for sequential data + BN and MCMC for non-sequential
100*N networks from DBN 100*N networks from BN
Informationtuner
Learn network
29
How to test the approach:
Problem : There is no way to test it on real data cause there is no completely known networkSolution : Work on realistic simulation where we know the network structureExample :
*-Hartemink A.” Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks”
0 1 12
2 4 13
3 5 6
7 8 9
10
11
*
Simulate
30
How to test the approach:
*-Hartemink A.” Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks”
0 1 12
2 4 13
3 5 6
7 8 9
10
11
*
Simulate
Sequential data Non-Sequential data
Infotuner DBN
MCMC
BNMCMC
0 1 122 4 133 5 6
7 8 91011
Compare using ROC curves
31
Test description:
Generate 60 sequential dataGenerate 120 non-sequential data (~reality proportion)Run DBN MCMC on sequential data keep 100*N sample netRun BN MCMC on non-sequential data keep 100*N sample netTest performance using weight on sample
0 BN 1 DBN.05 BN 0.95 DBN…0.95 BN .05 DBN1 BN 0 DBN
The metric used is the area under ROC curve. Perfect learner gets 1.0 , random gets 0.5 and the worst one gets 0.
32
Results:
1 DBN10
Are
a un
der R
OC
cur
ve
0 BN
33
Perspective:
Working on more sophisticated ways to mix sequential and non-sequential dataWorking on real cases:
Yeast cell-cycleArabidopsis Thaliana circadian rhythm
Real data also means missing valuesEvaluate missing values solution (EM, KNNImpute)
35
Why are there missing datas?
Low correlationExperimental problems
36
ROC Curve
Receiver Operating Characteristic curve
*-http://gim.unmc.edu/dxtests/roc2.htm
*
37
MCMC simulation and number of sampled networks
ROC curve area in function of the number of sample networks from MCMC simulation for N=12
0.86
0.865
0.87
0.875
0.88
0.885
0.89
0.895
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
3250
3500
3750
4000
4250
4500
4750
5000
# of samples from MCMC
RO
C a
rea