genetic network inference: from co-expression clustering to reverse engineering patrik...
TRANSCRIPT
Genetic network inference: from co-expression clustering to reverse engineering
Patrik D’haeseleer,Shoudan Liang and Roland Somogyi
The goal of this review Principles of genetic network
organization Computational methods for
extracting network architectures from experimental data
Outline Introduction A conceptual approach to complex
network dynamics Inference of regulation through clustering
of gene expression data Modeling methodologies Gene network inference:reverse
engineering Conclusions and Outlook
Genes encode proteins, some of which in turn regulate other genes
determine the structure of this intricate network of genetic regulatory interactions
Traditional approach: local Examining and collecting data on a
single gene, a single protein or a single reaction at a time
functional genomics
Functional Genomics Specifically, functional genomics refers
to the development and application of global experimental approaches to assess gene function by making use of the information and reagents provided by structural genomic. high throughput large scale experimental methodologies
combined with statistical and computational analysis of the results.
Intermediate representation Focus at the level of single cells A biological system can be
considered to be a state machine,where the change in internal state of the system depends on both its current internal state and any external inputs.
The goal Observe the state of a cell and how
it changes under different circumstances, and from this to derive a model of how these state changes are generated The state of cell
All those variables determining its behavior
Outline Introduction A conceptual approach to complex
network dynamics Inference of regulation through clustering
of gene expression data Modeling methodologies Gene network inference:reverse
engineering Conclusions and Outlook
The global gene expression pattern is the result of the collective behavior of individual regulatory pathways
Gene function depends on its cellular context; thus understanding the network as a whole is essential.
Boolean Networks Each gene is considered as a
binary variable—either ON or OFF—regulated by other genes through logical or Boolean functions.
Even with this simplification ,the network behavior is already extremely rich.
Boolean Networks(Cont.)
Cell differentiation corresponds to transitions from one global gene expression pattern to another.
Outline Introduction A conceptual approach to complex
network dynamics Inference of regulation through clustering
of gene expression data Modeling methodologies Gene network inference:reverse
engineering Conclusions and Outlook
Scoring methods Whether there has been a significant
change at any one condition Whether there has been a significant
aggregate change over all conditions Whether the fluctuation pattern
shows high diversity according to Shannon entropy
Guilt By Association Select a gene Determine its nearest neighbors in
expression space within a certain user-defined distance cut-off
Clustering extract groups of genes that are
tightly co-expressed over a range of different experiments.
Caution Different clustering methods can
have very different results It’s not yet clear which clustering
methods are most useful for gene expression analysis.
Definition:Gene Expression Profile An expression profile ej of an
ordered list of N samples(k=1 to N) for a particular gene j is a vector of scaled expression values vjk
The expression profile is: ej=(vj1,vj2,vj3,…,vjN)
Definition:Gene Expression Profile( Cont.) A difference between two genes p
and q may be estimated as N-dimensional metric “distance” between ep and eq.
Euclidean distance: = N
vvNj
jqjp
..1
2)(pqd
Clustering algorithms Non-hierarchical methods
Cluster N objects into K groups in an iterative process until certain goodness criteria are optimized
E.g. K-means
Clustering algorithms Hierarchical methods
Return an hierarchy of nested clusters, where each cluster typically consists of the union of two or more smaller clusters.
Agglomerative methods Start with single object clusters and
recursively merge them into larger clusters Divisive methods
Start with the cluster containing all objects and recursively divide it into smaller clusters
Other applications of co-expression clusters Extraction of regulatory motifs
Genes in the same expression share biological funtions
Inference of functional annotation Functions of unknown genes may be
hypothesized from genes with know function within the same cluster
As a molecular signature in distinguishing cell or tissue types mRNA expression
Which clustering method to use? There is no single best criterion for
obtaining a partition because no precise and workable definition of ‘cluster’ exists.
Clusters can be of any arbitrary shapes and sizes in a multidimensional pattern space.
Challenge in cluster analysis A gene could be a member of several
clusters, each reflecting a particular aspect of its function and control
Solutions clustering methods that partition genes
into non-exclusive clusters Several clustering methods could be
used simultaneously
Outline Introduction A conceptual approach to complex
network dynamics Inference of regulation through clustering
of gene expression data Modeling methodologies Gene network inference:reverse
engineering Conclusions and Outlook
Level of biochemical detail abstract
Boolean networks concrete
Full biochemical interaction models with stochastic kinetics in Arkin et al.(1998)
Forward and inverse modeling Forward modeling approach Inverse modeling, or reverse
engineering Given an amount of data, what can
we deduce about the unknown underlying regulatory network?
Requires the use of a parametric model, the parameters of which are then fit to the real-world data.
Outline Introduction A conceptual approach to complex
network dynamics Inference of regulation through clustering
of gene expression data Modeling methodologies Gene network inference:reverse
engineering Conclusions and Outlook
Goal of network inference Construct a coarse-scale model of
the network of regulatory interactions between the genes
It’s possible to reverse engineer a network from its activity profiles
Data requirements We need to observe the expression
of that gene under many different combinations of expression levels of its regulatory inputs Use data from different sources Deal with different data types
Estimates for network models a sparse network model of N
genes, where each gene is only affected by K other genes on average.
a sparsely connected, directed graph with N nodes and NK edges.
Estimate for network models(Cont.) To specify the correct model, we need
bits of information.
)!()!(
!loglog
2
2
2
NKNNK
NC NK
N
)/log( KNNK
Correlation Metric Construction Adam Arkin and John Ross A method to reconstruct reaction
networks from measured time series of the component chemical species.
The system is driven using inputs for some of the chemical species and the concentration of all the species is monitored over time.
Correlation Metric Construction(Cont. ) The time-lagged correlation matrix is
calculated From this a distance matrix is constructed
based on the maximum correlation between any two chemical species
This distance matrix is then fed into a simple clustering algorithm to generate a tree of connections between the species
The results are mapped into a two-dimensional graph for visualization
Additive regulation models Property
The regulatory inputs are combined using a weighted sum
Can be used as a first-order approximation to the gene network
Additive regulation models The change in each variable over time is
given by a weighted sum of all other variables
is the level of the i-th varibale is a bias term indicating whether I is
expressed of not in the absence of regulatory inputs
represents the influence of j on the regulation of i
j
ijjii bywy
iy
ib
jiw
Use of such models We can infer regulatory
interactions directly from the data, by fitting these simple network models to large scale gene expression data.
Outline Introduction A conceptual approach to complex
network dynamics Inference of regulation through clustering
of gene expression data Modeling methodologies Gene network inference:reverse
engineering Conclusions