insights from boolean modeling of genetic regulatory networks ilya shmulevich
TRANSCRIPT
Insights from Boolean Insights from Boolean Modeling of Genetic Modeling of Genetic Regulatory NetworksRegulatory Networks
ilya shmulevichilya shmulevich
22
Part IPart I
1.1. Discover and understand the Discover and understand the underlying gene regulatory underlying gene regulatory mechanisms by means of inferring mechanisms by means of inferring them from data.them from data.
2.2. By using the inferred model, By using the inferred model, endeavor to make useful endeavor to make useful predictions by mathematical predictions by mathematical analysis and computer simulations.analysis and computer simulations.
33
genetic networksgenetic networks
Complex regulatory Complex regulatory networks among genes networks among genes and their products and their products control cell behaviors control cell behaviors such as:such as:– cell cyclecell cycle– apoptosisapoptosis– cell differentiationcell differentiation– communication between communication between
cells in tissuescells in tissues A paramount problem is A paramount problem is
to to understand the understand the dynamical interactionsdynamical interactions among these genes, among these genes, transcription factors, transcription factors, and signaling cascades, and signaling cascades, which govern the which govern the integrated behavior of integrated behavior of the cell.the cell.
Analogy: circuit diagram
44
Clinical ImpactClinical Impact
Model-based and computational Model-based and computational analysis cananalysis can– open up a window on the physiology open up a window on the physiology
of an organism and disease of an organism and disease progression;progression;
– translate into accurate diagnosis, translate into accurate diagnosis, target identification, drug target identification, drug development, and treatment.development, and treatment.
55
What class of models What class of models should be chosen?should be chosen? The selection should be made in The selection should be made in
view ofview of– data requirementsdata requirements– goals of modeling and analysis.goals of modeling and analysis.
Data Model Goals
66
Classical tradeoffClassical tradeoff
A “fine” model with many parametersA “fine” model with many parameters– may be able to capture detailed “low-level” may be able to capture detailed “low-level”
phenomena (protein concentrations, phenomena (protein concentrations, reaction kinetics);reaction kinetics);
– requires large amounts of data for inferencerequires large amounts of data for inference A “coarse” model with low complexityA “coarse” model with low complexity
– may succeed in capturing only “high-level” may succeed in capturing only “high-level” phenomena (e.g. which genes are ON/OFF);phenomena (e.g. which genes are ON/OFF);
– requires smaller amounts of datarequires smaller amounts of data
77
Ockham’s RazorOckham’s Razor
Underlies all scientific Underlies all scientific theory building.theory building.
Model complexity should Model complexity should never be made higher never be made higher than what is necessary to than what is necessary to faithfully “explain the faithfully “explain the data.”data.”
What What kindkind of data do we of data do we have and have and how muchhow much??
William of Ockham (1280-1349)
88
Boolean NetworksBoolean Networks
1.1. To what extent do To what extent do such models such models represent reality?represent reality?
2.2. Do we have the Do we have the “right” type of data “right” type of data to infer these to infer these models?models?
3.3. What do we hope What do we hope to learn from them?to learn from them?
99
Basic Structure of Basic Structure of Boolean NetworksBoolean Networks
A
X
B
Boolean functionA B X0 0 10 1 11 0 01 1 1
1 means active/expressed0 means inactive/unexpressed
In this example, two genes (A and B) regulate gene X. In principle, any number of “input” genes are possible. Positive/negative feedback is also common (and necessary for homeostasis).
1010
Dynamics of Boolean Dynamics of Boolean NetworksNetworks
0 1 1 0 01
A B C D E F Time
1
A
1
B
0
C
1
D
1
E
0
F
1111
State Space of Boolean State Space of Boolean NetworksNetworks
Picture generated using the program DDLab.
equate cellular states (or equate cellular states (or fates) with attractors.fates) with attractors.
attractor states are attractor states are stable under small stable under small perturbationsperturbations– most perturbations cause most perturbations cause
the network to flow back to the network to flow back to the attractor.the attractor.
– some genes are more some genes are more importantimportant and changing and changing their activation can cause their activation can cause the system to transition to the system to transition to a different attractor.a different attractor.
1212
Taylor, Galitski
Non-Filamentous
Filamentous
Environmental Input
Mpt5
Cdc42
Dig1/2Kss1
Ras2
Ste11Ste20
Ste7
Tec1-Ste12
Boolean model of the yeast Boolean model of the yeast filamentation networkfilamentation network
1313
But can we extract But can we extract meaningful biological meaningful biological information from gene information from gene expression data expression data entirely in the binary entirely in the binary domain?domain?
We reasoned that if genes, when quantized We reasoned that if genes, when quantized to only two levels (1 or 0) would not be to only two levels (1 or 0) would not be informative in separating known subclasses informative in separating known subclasses of tumors, then there would be little hope of tumors, then there would be little hope for Boolean inference of real genetic for Boolean inference of real genetic networks.networks.
1414
Gene expression analysis Gene expression analysis in the binary domainin the binary domain By using binary gene By using binary gene
expression data and expression data and Hamming distance Hamming distance as a similarity as a similarity metric, a separation metric, a separation between different between different subtypes of gliomas subtypes of gliomas is evident, using is evident, using multidimensional multidimensional scaling.scaling.
Shmulevich, I. and Zhang, W. (2002) Bioinformatics 18(4), 555-565.
1515
Boolean FrameworkBoolean Framework
Limited amounts of data and the Limited amounts of data and the noisy nature of the noisy nature of the measurements can make useful measurements can make useful quantitative inferences quantitative inferences problematic and a coarse-scale problematic and a coarse-scale qualitative modeling approach qualitative modeling approach seems to be justified.seems to be justified.
Boolean idealization enormously Boolean idealization enormously simplifies the modeling task.simplifies the modeling task.
We wish to study the collective We wish to study the collective regulatory behavior without regulatory behavior without specific quantitative details.specific quantitative details.
Boolean networks qualitatively Boolean networks qualitatively capture typical genetic behavior.capture typical genetic behavior.
• Albert, R & Othmer, Albert, R & Othmer, H.G. (2003) H.G. (2003) J. Theor. J. Theor. BiolBiol. . 223223, 1-18., 1-18.• Mendoza, L., Mendoza, L., Thieffry, D. & Thieffry, D. & Alvarez-Buylla, R.E. Alvarez-Buylla, R.E. (1999) (1999) Bioinformatics Bioinformatics 1515, 593-606., 593-606.• Huang, S. & Ingber, Huang, S. & Ingber, D. E. (2000) D. E. (2000) Exp. Exp. Cell Res.Cell Res. 261,261, 91-103. 91-103.• Li F, Long T, Lu Y, Li F, Long T, Lu Y, Ouyang Q, Tang C. Ouyang Q, Tang C. (2004) (2004) PNASPNAS. . 101(14):4781-6.101(14):4781-6.
1616
1717
Probabilistic Boolean Networks Probabilistic Boolean Networks (PBN)(PBN)
Share the appealing rule-based properties of Share the appealing rule-based properties of Boolean networks.Boolean networks.
Robust in the face of uncertainty.Robust in the face of uncertainty. Dynamic behavior can be studied in the context Dynamic behavior can be studied in the context
of Markov Chains.of Markov Chains.– Boolean networks are just special cases.Boolean networks are just special cases.
Close relationship to (dynamic) Bayesian Close relationship to (dynamic) Bayesian networksnetworks– Explicitly represent probabilistic relationships between Explicitly represent probabilistic relationships between
genes. genes. ((Lähdesmäki Lähdesmäki et alet al. (2006) . (2006) Sig. ProcSig. Proc., 86(4):814-., 86(4):814-834834))
– Can represent the same joint probability distribution.Can represent the same joint probability distribution. Allow quantification of influence of genes on Allow quantification of influence of genes on
other genes (stay tuned for examples)other genes (stay tuned for examples)Shmulevich et al. (2002) Proceedings of the IEEE, 90(11), 1778-1792.
1919
Model Inference from Model Inference from Gene Expression DataGene Expression Data Two approaches:Two approaches:
– Coefficient of DeterminationCoefficient of Determination (Dougherty (Dougherty et al.et al. 2000) 2000)
– Best-Fit ExtensionsBest-Fit ExtensionsLähdesmäki et al. (2003) Machine Learning, 52, 147-167.
2020
Coefficient of Coefficient of Determination (COD)Determination (COD) COD is used to discover associations COD is used to discover associations
between variables.between variables. It measures the degree to which the It measures the degree to which the
expression levels of an observed gene set expression levels of an observed gene set can be used to improve the prediction of can be used to improve the prediction of the expression of a target gene relative to the expression of a target gene relative to the best possible prediction in the absence the best possible prediction in the absence of observations.of observations.
Using the COD, one can find sets of genes Using the COD, one can find sets of genes related multivariately to a given target related multivariately to a given target gene.gene.
2121
COD DefinitionCOD Definition
1ix
2ix
kix
ix Target geneObserved genes
Optimal Predictor
kiii xxxf ,,,
21
i
opti
i is the error of the best (constant) estimate of xi in the absence of any conditional variables
opt is the optimal error achieved by f
2222
Constraints During Constraints During InferenceInference Constraining the class of predictors can Constraining the class of predictors can
have advantages:have advantages:– lessening the data requirements for reliable lessening the data requirements for reliable
estimation;estimation;– incorporating prior knowledge of the class of incorporating prior knowledge of the class of
functions representing genetic interactions;functions representing genetic interactions;– certain classes of functions are more certain classes of functions are more
plausible from the point of view of evolution, plausible from the point of view of evolution, noise resilience, network dynamics, etc.noise resilience, network dynamics, etc.
2323
Example of Constraint: Example of Constraint: Post ClassesPost Classes
• The class is sufficiently large (this is important The class is sufficiently large (this is important for inference).for inference).
• An abundance of functions from this class will An abundance of functions from this class will tend to prevent chaotic behavior in networks.tend to prevent chaotic behavior in networks.
• Eukaryotic cells are not chaotic! Eukaryotic cells are not chaotic! (Shmulevich (Shmulevich et al. et al. (2005) (2005) PNASPNAS 102(38), 13439-13444.) 102(38), 13439-13444.)
• Functions from this class have a natural way to Functions from this class have a natural way to ensure robustness against noise and ensure robustness against noise and uncertainty.uncertainty.
Emil Post (1897-1954)Emil Post (1897-1954)
Shmulevich et al. (2003) PNAS 100(19), 10734-10739.
2424
Post Class Constraints Post Class Constraints During InferenceDuring Inference
We compared the Post classes to the class of all We compared the Post classes to the class of all Boolean functions (i.e. no constraint) by estimating Boolean functions (i.e. no constraint) by estimating the corresponding prediction error for a set of the corresponding prediction error for a set of target genes, using available gene expression data.target genes, using available gene expression data.
We found that the optimal error of Post functions We found that the optimal error of Post functions compares favorably with optimal error without compares favorably with optimal error without constraint.constraint.
A hypothesis testing-based study gives no A hypothesis testing-based study gives no statistically significant evidence against the use of statistically significant evidence against the use of constrained function classes (i.e. cost of constraint).constrained function classes (i.e. cost of constraint).
Thus, Post classes are also plausible in light of Thus, Post classes are also plausible in light of experimental data.experimental data.
2525
SubnetworksSubnetworksTheory and ExamplesTheory and Examples aim: discover relatively small aim: discover relatively small
subnetworkssubnetworks– whose genes interact significantly andwhose genes interact significantly and– whose genes are not strongly influenced whose genes are not strongly influenced
by genes outside the subnetwork.by genes outside the subnetwork. Principle of AutonomyPrinciple of Autonomy Start with a ‘seed’ gene set and Start with a ‘seed’ gene set and
iteratively adjoin new genes so as to iteratively adjoin new genes so as to enhance subnetwork autonomy.enhance subnetwork autonomy.
2626
Growing AlgorithmGrowing Algorithm
To achieve network autonomy, both of these strengths of
connections should be high.
The sensitivity of Y from the outside should be small.
Various stopping criteria can be used
Hashimoto et al. (2004) Bioinformatics 20(8): 1241-1247.
2727
Cancer tissues need nutrients. Gliomas are highly angiogenic.
Expression of VEGF is often elevated.
2828
VEGF is elevated in advanced stage of gliomasConfirmation and localization by tissue microarray
2929
VEGF protein is secreted outside the cells and binds to its receptor on the
endothelial cells to promote their growth.
3030
GRB2GRB2
FGF7FGF7
FSHRFSHR
PTK7PTK7
VEGFVEGF Member of fibroblast growth factor family
Follicle-stimulating hormone receptor
Tyrosine kinase receptor
•The protein products of all four genes are part of signal transduction pathways that involve surface tyrosine kinase receptors.
•These receptors, when activated, recruit a number of adaptor proteins to relay the signal to downstream molecules
•GRB2 is one of the most crucial adaptors that have been identified.
•GRB2 is also a target for cancer intervention because of its link to multiple growth factor signal transduction pathways.
3131
GRB2GRB2
GNB2GNB2
•Molecular studies have demonstrated that activation of protein tyrosine kinase receptor-GRB-2 complex activates ras-MAP kinase-NFB pathway to complete the signal relay from outside the cells to the nucleus.
•GNB2 is a ras family member.
MAP kinase 1MAP kinase 1
c-relc-rel•GNB2 influences MAP
kinase 1, which in turn influences c-rel, an NFB component.
3232
Such relationships should also be Such relationships should also be validated experimentally.validated experimentally.
The networks built from our The networks built from our models provide valuable models provide valuable theoretical guidance for further theoretical guidance for further experiments.experiments.
3333
•IGFBP2 is overexpressed in high-grade gliomas
•IGFBP2 contributes to increased cell invasion.
3434
IGFBP2 is elevated in advanced stage of gliomasConfirmation and localization by tissue microarray
3535
Vector
Low IGFBP2 clone
High IGFBP2 clone 1
High IGFBP2 clone 2
IGFBP2 promotes glioma cell invasion in vitro
3636
A. Niemistö, L. Hu, O. Yli-Harja, W. Zhang, I. Shmulevich, "Quantification of in vitro cell invasion through image analysis," International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS'04), San Francisco, California, USA, Sep. 1-5, 2004.
3737
+1-561
c-Myc AP2 NFB
NFNFBB
IGFBP2IGFBP2 •A review of the literature showed that Cazals et al. (1999) indeed demonstrated that NFB activated the IGFBP2 promoter in lung alveolar epithelial cells.
3838
•Higher NFB activity in IGFBP2 overexpressing cells was also found.
•Transient transfection of IGFBP2 expressing vector together with NFB promoter reporter gene construct did not lead to increased NFB activity, suggesting an indirect effect of IGFBP2 on NFB
NFNFBB
IGFBP2IGFBP2
TNFR2TNFR2
ILKILK•Our real-time PCR data showed that in stable IGFBP2-overexpressing cell lines, IGFBP2 indeed enhances ILK expression.
•In addition, IGFBP2 contains an RGD domain, implying its interaction with integrin molecules.
•ILK is in the integrin signal transduction pathway.
•Studies also showed that IGFBP2 affects cell apoptosis and TNFR2 is a known regulator of apoptosis
3939
PBN web pagePBN web pagehttp://personal.systemsbiology.net/ilya/PBN/PBN.htm
•Reprints•Software (BN/PBN MATLAB
Toolbox)•Posters/Presentations•Workshops•Links•PBN People
4040
PBN CollaboratorsPBN CollaboratorsWei Zhang
Harri LähdesmäkiOlli Yli-HarjaJaakko Astola
Edward DoughertyRonaldo Hashimoto
Marcel BrunSeungchan KimEdward SuhHuai LiMichael Bittner
SupportNIH/NIGMS R21 GM070600-01
NIH/NIGMS R01 GM072855-01
Part IIPart II
4242
Joint work withJoint work with
4343
Order/ChaosOrder/Chaos
A broad body of work over the A broad body of work over the past 35 years has shown that a past 35 years has shown that a variety of model genetic variety of model genetic regulatory networks behave in regulatory networks behave in two broad regimes, two broad regimes, orderedordered and and chaoticchaotic, with an analytically and , with an analytically and numerically demonstrated phase numerically demonstrated phase transition between the two.transition between the two.
4444
““Edge of chaos”Edge of chaos” The boundary between The boundary between orderorder and and chaoschaos is called is called
the the complex regimecomplex regime or the or the critical phasecritical phase..– The system can undergo a kind of phase transition.The system can undergo a kind of phase transition.– Networks are most evolvable at the “edge of chaos.”Networks are most evolvable at the “edge of chaos.”
Living system in a variable environment:Living system in a variable environment:– Strike a balance: Strike a balance: malleability vs. stabilitymalleability vs. stability– Must be stable, but not so stable that it remains Must be stable, but not so stable that it remains
forever static.forever static.– Must be malleable, but not so malleable that it is Must be malleable, but not so malleable that it is
fragile in the face of perturbations.fragile in the face of perturbations.
4545
Plausible and long-standing Plausible and long-standing hypothesishypothesis: :
Real cells lie in the ordered regime Real cells lie in the ordered regime or are critical.or are critical.
“Life at the edge of chaos”
There has been no experimental data supporting this There has been no experimental data supporting this hypothesis.hypothesis.
4646
Ordered Ordered networksnetworks HomeostasisHomeostasis A modest number of small recurrent A modest number of small recurrent
patterns of gene activity (attractors)patterns of gene activity (attractors)– plausible models of the diverse cell types (or plausible models of the diverse cell types (or
cell fates) of an organismcell fates) of an organism– the phenotypic traits of the organism are the phenotypic traits of the organism are
encoded in the dynamical attractors of its encoded in the dynamical attractors of its underlying genetic regulatory network underlying genetic regulatory network
Confined avalanches of gene activity Confined avalanches of gene activity changes following transient perturbations changes following transient perturbations in the activity of single genesin the activity of single genes– i.e. confined damage spreadingi.e. confined damage spreading
4747
Chaotic Chaotic networksnetworks Nearby states lie on trajectories that divergeNearby states lie on trajectories that diverge
– hence, fail to exhibit a natural basis for hence, fail to exhibit a natural basis for homeostasishomeostasis
Have enormous attractors whose sizes scale Have enormous attractors whose sizes scale exponentially with the number of genesexponentially with the number of genes
Exhibit vast avalanches of gene activity Exhibit vast avalanches of gene activity alterations following transient perturbations alterations following transient perturbations to single gene activitiesto single gene activities
4848
The model classThe model class
Random Boolean Networks (RBNs) - Random Boolean Networks (RBNs) - Kauffman (1969) Kauffman (1969) “ensemble approach”“ensemble approach”– One of the most intensively studied One of the most intensively studied
models of discrete dynamical systems.models of discrete dynamical systems.– Sustained interest from biology and Sustained interest from biology and
physics communities.physics communities.– Considered for many years as prototypes Considered for many years as prototypes
of nonlinear dynamical systems.of nonlinear dynamical systems. RBNs are:RBNs are:
– Structurally simple yet capable of Structurally simple yet capable of remarkably rich complex behavior!remarkably rich complex behavior!
4949
ConnectivityConnectivity
However, it is also possible to let ki be random,chosen under various distributions.
(e.g. scale-free)
n
i iknK1
1Mean number of input variables
5050
BiasBias The bias The bias pp of a random of a random
function is the probability function is the probability that it takes on the value 1.that it takes on the value 1.
If If p = p = 0.5, then the function is 0.5, then the function is unbiasedunbiased..
5151
Connectivity, bias, and Connectivity, bias, and the phase transitionthe phase transition
0.750.50.25
20
15
10
5
0
p
K
p
K
CriticalPhase
112 pKp
Average Network
Sensitivity
Shmulevich & Kauffman (2004) Physical Review Letters, 93(4): 048701
5252
Phase transitionPhase transition
RBNs can be tuned to undergo a RBNs can be tuned to undergo a phase transition byphase transition by– tuning the connectivity tuning the connectivity KK– tuning the bias tuning the bias pp– tuning the scale-free exponent tuning the scale-free exponent γγ
AldanaAldana & Cluzel (2003) & Cluzel (2003) PNAS, PNAS, 100(15):8710-4100(15):8710-4..
– tuning abundance of functional classestuning abundance of functional classes Shmulevich Shmulevich et alet al. (2003) . (2003) PNASPNAS 100(19):10734-9. 100(19):10734-9.
5353
Our approachOur approach
Measure and compare the complexity of Measure and compare the complexity of time series data of HeLa cells with that of time series data of HeLa cells with that of mock data generated by RBNs operating in mock data generated by RBNs operating in the the orderedordered, , criticalcritical, and , and chaoticchaotic regimes. regimes.
We use the Lempel-Ziv (LZ) measure of We use the Lempel-Ziv (LZ) measure of complexity.complexity.
Dataset: Whitfield Dataset: Whitfield et alet al. (2002) . (2002) Mol. Biol. Mol. Biol. CellCell. 13, 1977-2000.. 13, 1977-2000.– synchronized HeLa cells; 48 time points at 1-synchronized HeLa cells; 48 time points at 1-
hour time intervals; 29,621 distinct geneshour time intervals; 29,621 distinct genes
5454
01100101101100100110
Lempel-Ziv ComplexityLempel-Ziv Complexity
The algorithm parses the sequence into shortest words that have not occurred previously and the complexity is defined as the number of such words. Words are unique, except possibly the last one.
LZ Complexity = 7
01010101010101010101
LZ Complexity = 3
5555
Lempel-Ziv Complexity Lempel-Ziv Complexity ExampleExample
0*1*10*010*1101*100100*110
LZ Complexity = 7
5656
Lempel-Ziv Complexity: Lempel-Ziv Complexity: some remarkssome remarks ““Universal” complexity measureUniversal” complexity measure Basis of powerful lossless compression Basis of powerful lossless compression
schemes (ZIP, GIF, etc.)schemes (ZIP, GIF, etc.)– by replacing words with a pointer to a by replacing words with a pointer to a
previous occurrence of the same wordprevious occurrence of the same word Optimal: compression rate approaches Optimal: compression rate approaches
the entropy of the random sequencethe entropy of the random sequence Asymptotically Gaussian: can be used Asymptotically Gaussian: can be used
for statistical test of randomness.for statistical test of randomness.
5757
IntuitionIntuition
Genes in Genes in orderedordered networks have networks have lowlow LZ complexities. LZ complexities.
Genes in Genes in chaoticchaotic networks have networks have highhigh LZ complexities. LZ complexities.
5858
BinarizationBinarization
We used the well-known We used the well-known kk-means -means algorithm with two groups, algorithm with two groups, corresponding to the two binary values corresponding to the two binary values (0,1).(0,1).
5959
2 3 4 5 6 7 8 9 10 11 12 13 140
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
LZ Complexity
Random
Data
Lempel-Ziv complexity distributions of binarized HeLa data vs. random binary data
6060
HeLa time-series data
RBNBinarize
01101001101001101011 10011001100100110110
ordered
critical
chaotic
LZ complexities LZ complexities
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding permuted; = 1
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5k-means permuted
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding data; = 1
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5k-means data
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding data; = 2
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding data; = 3
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding permuted; = 1
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5k-means permuted
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding data; = 1
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5k-means data
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding data; = 2
2 3 4 5 6 7 8 910111213140
0.1
0.2
0.3
0.4
0.5thresholding data; = 3
Compute distanceFind minimum
(29,621 genes by 48 time points)29,621 genes by 48 time points)
6161
Distance between LZ Distance between LZ distributionsdistributions
m
i iii qppqpD1
)/log(,
Kullback-Leibler (KL) distanceKullback-Leibler (KL) distance
Euclidean distanceEuclidean distance
2/1
1
2,
m
i ii qpqpE
6262
Three techniques to tune Three techniques to tune orderedordered, , criticalcritical, and , and chaoticchaotic regimes. regimes.
1.1. Fix Fix pp = 0.5, let = 0.5, let KK = = 11, , 22, , 33, , 44..
2.2. Fix Fix KK = 4, let = 4, let pp = = 0.933010.93301, , 0.853550.85355, , 0.750.75, , 0.50.5..
3.3. Scale-free topology with Scale-free topology with connectivity connectivity K(K(γγ)). Vary scale-free . Vary scale-free exponent exponent γγ such that average such that average network sensitivity is equal to the network sensitivity is equal to the cases above. cases above. ((AldanaAldana & Cluzel (2003) & Cluzel (2003) PNAS, PNAS, 100(15):8710-4100(15):8710-4))
6363
But what about noise?But what about noise? Wouldn’t noise make things look more Wouldn’t noise make things look more
chaotic?chaotic? There are two issues:There are two issues:
– In the binary domain, the compound effect of noise In the binary domain, the compound effect of noise amounts to a certain percentage of values in the amounts to a certain percentage of values in the time series data being flipped from zero to one or time series data being flipped from zero to one or vice versa.vice versa.
– Many genes are expressed at levels that are below Many genes are expressed at levels that are below those corresponding to pure noise.those corresponding to pure noise.
Fortunately, using the HeLa data, it is possible Fortunately, using the HeLa data, it is possible to estimate both the binary noise probability to estimate both the binary noise probability and the global “noise floor” level as follows.and the global “noise floor” level as follows.
6464
Estimate the “noise Estimate the “noise floor”floor” There are 963 empty spots on the HeLa There are 963 empty spots on the HeLa
microarrays. microarrays. As a conservative estimate, for each of As a conservative estimate, for each of
the 48 microarrays, we used the 95th the 48 microarrays, we used the 95th percentile of the values of the empty percentile of the values of the empty spots as the noise floor level for that spots as the noise floor level for that array.array.
Only those genes whose values exceed Only those genes whose values exceed this global threshold at all time points are this global threshold at all time points are included for further analysis.included for further analysis.– Hence our criteria are very stringent. Hence our criteria are very stringent.
6565
Estimate the noise probability Estimate the noise probability qq
We made use of the replicated probes We made use of the replicated probes available on the arrays. available on the arrays. – 2001 duplicate gene profiles of 48 time 2001 duplicate gene profiles of 48 time
points.points. Keeping only those that exceeded the Keeping only those that exceeded the
global threshold, we binarized each of global threshold, we binarized each of the duplicate profiles and computed the duplicate profiles and computed the normalized Hamming distance.the normalized Hamming distance.
35.0ˆ q with a 95% bootstrap confidence interval of [0.32, 0.38].
6666
Euclidean (fix Euclidean (fix p p = 0.5, tune = 0.5, tune KK))
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.2
0.4
0.6
0.8
1
1.2
1.4
q
Euclid
ean d
ista
nce(a)
K = 1K = 2K = 3K = 4
Shmulevich et al. (2005) PNAS 102(38):13439.
6767
Kullback-Leibler (fix Kullback-Leibler (fix p p = 0.5, tune = 0.5, tune KK))
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
2
4
6
8
10
12
14
16
18
20
q
Kullback-L
eib
ler dis
tance
(b)
K = 1K = 2K = 3K = 4
Shmulevich et al. (2005) PNAS 102(38):13439.
6868
Euclidean (fix Euclidean (fix KK = 4, tune = 4, tune pp))
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.2
0.4
0.6
0.8
1
(c)
q
Euclidean d
ista
nce
p = 0.5p = 0.75p = 0.85355p = 0.93301
Shmulevich et al. (2005) PNAS 102(38):13439.
6969
Kullback-Leibler (fix Kullback-Leibler (fix KK = 4, tune = 4, tune pp))
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
2
4
6
8
10
12
14
16
18
20
q
Kullback-L
eib
ler dis
tance
(d)
p = 0.5p = 0.75p = 0.85355p = 0.93301
Shmulevich et al. (2005) PNAS 102(38):13439.
7070
Euclidean, Scale-free (tune Euclidean, Scale-free (tune γγ))
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.2
0.4
0.6
0.8
1
1.2
1.4
q
Eu
clid
ea
n d
ista
nce
0.34 0.345 0.35 0.355 0.360.1
0.15
0.2
0.25 K = 1K = 2K = 3K = 4K = 5
Average sensitivityequivalent to
Shmulevich et al. (2005) PNAS 102(38):13439.
7171
Kullback-Leibler, Scale-free (tune Kullback-Leibler, Scale-free (tune γγ))
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
2
4
6
8
10
12
14
16
18
20
q
Ku
llba
ck-L
eib
ler
dis
tan
ce
0.34 0.345 0.35 0.355 0.360.4
0.6
0.8
1 K = 1K = 2K = 3K = 4K = 5
Average sensitivityequivalent to
Shmulevich et al. (2005) PNAS 102(38):13439.
7272
Concluding remarksConcluding remarks
The results strongly suggest that HeLa cells The results strongly suggest that HeLa cells are in the ordered regime or are critical, but are in the ordered regime or are critical, but not chaoticnot chaotic..
We cannot statistically distinguish between We cannot statistically distinguish between ordered and critical with these data.ordered and critical with these data.
Critical networks appear to predict the Critical networks appear to predict the distribution of genes whose activities are distribution of genes whose activities are altered in several hundred knock-out mutants altered in several hundred knock-out mutants of yeast. of yeast. (Serra (Serra et alet al. (2004) . (2004) J. Theor. BiolJ. Theor. Biol. . 227, 149-157)227, 149-157)
It will be important to use more realistic It will be important to use more realistic ensembles of model genetic networks to test ensembles of model genetic networks to test whether our conclusions hold.whether our conclusions hold.