Networks in Biology
Gene Regulatory Networks (GRNs)
Dr. Katja Nowick [email protected]
www.nowick-lab.info
Networks in Biology Networks in cells (molecular networks):
• Metabolic Networks • Gene regulatory networks • Protein-Protein-Interaction networks Networks between cells:
• Neural networks • Immune system Networks in ecosystems:
• Food networks • Cooperation/Symbiosis Social networks:
• Friendships • Epidemiology
Identity of the nodes (vertices) and meaning of the links (edges) depends on the studied network
Characteristics of biological networks
• Node degree distribution follows a power law
• Small world characteristics
• Hierarchical and modular organization
• Overrepresentation of certain network motifs
• Preferential attachment
• Are dynamic
Typical parameters analyzed in a network
• Node degree (hubinesss)
• Neighborhood
• Centralization
• Clustering coefficient
• Centrality (Betweenness Centrality, Closeness Centrality)
Why are cells different from each other?
MacArthur et al., PLoS ONE 3: e3086 (2008)
• Stem cell differentiation regulation
6
Nodes: Genes, including transcription factors (TFs)
Links: Interactions: who regulates expression of whom Directional or bidirectional Activating or repressing Feed-back and other loops
Examples of Gene Regulation Networks
• TF network of E.coli
Examples of Gene Regulation Networks
Ca. 20% of all interactions in E.coli Here nodes are operons (genes on the same mRNA) Links: TF X regulates operon Y
• TF network of drosophila embryonic development
Examples of Gene Regulation Networks
! TFs are also proteins some generated proteins regulate new genes network
Transcription + translation (gene expression)
MacArthur et al., PLoS ONE 3: e3086 (2008)
• Stem cell differentiation regulation
10
1. Nodes: Genes, including transcription factors (TFs)
Links: Interactions: who regulates expression of whom Directional or bidirectional Activating or repressing Feed-back and other loops
Examples of Gene Regulation Networks
TFs regulate expression of other genes
Gene Promoter
TF
TFs regulate expression of other genes
Gene Promoter
Many TFs have to come together to start/stop transcription of a target
Transcription factors (TFs)
Modified after Messina et al., 2004
~ 1500 TFs in human genome
RFX
ZNF
HOX
BHLH
Β-Scaffold
BZip
NHR
Trp cluster
FOX Bromodomain
T-Box
Jumonji
E2F
Dwarfin
Paired Box
Heat shock
Tubby AF-4
Methyl-CpG-binding
AP-2
TEA
Pocket domain
GCM
Other
Structural
ZNF 762
HOX 199
BHLH 117
Some TFs bind DNA as dimers
bHLH: basic helix loop helix TFs bZip: beta zipper TFs NR: nuclear receptors
Homo-dimers or hetero-dimers added complexity
Many TFs have to come together to start/stop transcription of a target
Environmental signals trigger the GRN
Environmental signals trigger the GRN - Activators -
Environmental signals trigger the GRN - Repressors -
TFs are often hubs in the GRNs
TF
TF TF
TF
TF
TF
TF
TF
• TFs and their target genes
TF Binding sites (TFBS): short sequence motifs, degenerate
Enhancers are sites on the DNA helix that are bound to by activators in order to loop the DNA bringing a specific promoter to the initiation complex. Enhancers are much more common in eukaryote than prokaryotes, where only a few examples exist (to date).
Silencers are regions of DNA sequences that, when bound by particular transcription factors, can silence expression of the gene.
TF binding to DNA
Gene Promoter
TFs recognize specific sites/motifs in DNA
• TFs bind short sequence motifs • Motifs are degenerated
Gene Promoter
TFs interact to regulate their targets
• TFs cooperate to regulate their targets
TF
TF TF
TF
TF
TF
TF TF
TFs interact to regulate their targets
• Co-occurrence of TF binding sites in the genome
Encode 2012
Complex TF interactions
• TFs bind as monomers, homo-dimers, or hetero-dimers
• Multiple TFs (~7-10) cooperate to regulate gene expression
• TFs regulate the expression of other TFs
• Feedback loops, autoregulation …
• It makes sense to represent this complexity in a network
• Summary
TFs: what is known and what not
Not only TFs regulate gene expression
General TFs RNA polymerase II transcription-
initiation complex
Specific TFs Activate or repress expression of
particular genes
miRNAs Bind to mRNA to degrade them
Cofactors Bridge between specific and
general TFs; activate or repress
Chromatin remodeler Make DNA accessible or
inaccessible
GRFs
*GRN = Gene Regulatory Factor
Epigenetic control of gene expression
• Chromatin remodeler
Examples of epigenetic/histone modifications
*
*
Temporal changes of the epigenome
Interactions between TFs and histone modifications
• Histone modifications influence chromatin states • Chromatin states influence binding of TFs • TFs interact with enzymes that modify histones
Not only TFs regulate gene expression
General TFs RNA polymerase II transcription-
initiation complex
Specific TFs Activate or repress expression of
particular genes
miRNAs Bind to mRNA to degrade them
Cofactors Bridge between specific and
general TFs; activate or repress
Chromatin remodeler Make DNA accessible or
inaccessible
GRFs
*GRN = Gene Regulatory Factor
A primary miRNA (pri-miRNA) transcript is encoded in the cell's DNA and transcribed in the nucleus, processed by an enzyme Dosha and exported into the cytoplasm where it is further processed by Dicer. After strand separation, the mature miRNA represses protein production either by blocking translation or causing transcript degradation.
miRNAs • = small non-coding RNA molecule (ca. 22 nucleotides) • > 1000 miRNAs in the human genome
Interactions between TFs, miRNAs, other ncRNAs, and histone modifications
• Neurogenesis
Interactions between TFs, miRNAs, other ncRNAs, and histone modifications
• TFs bind as monomers, homo-dimers, or hetero-dimers
• Multiple TFs (~7-10) cooperate to regulate gene expression
• TFs regulate the expression of other TFs
• Feedback loops, autoregulation …
• Network
~5000 ncRNAs
~375 Mio interactions
• Add epigenetic modifications
• Add ncRNAs
• Even more complex networks
Why are tissues different from each other?
Cell states are defined by gene expression
How is a gene activated or repressed (at a certain time and location)? So let’s talk about the links now
MacArthur et al., PLoS ONE 3: e3086 (2008)
• Stem cell differentiation regulation
36
Nodes: Genes, including transcription factors (TFs)
2. Links: Interactions: who regulates expression of whom Directional or bidirectional Activating or repressing Feed-back and other loops
Examples of Gene Regulation Networks
Cell states are defined by gene expression
How is a gene activated or repressed (at a certain time and location)? Goal: discover which gene is regulated by which TF How do we get the information for the links?
• Manual • Semi-automated (i.e. preBIND) • Natural Language Processing (NLP) (i.e. PathwayStudio)
Donaldson I, et al. BMC Bioinformatics. 4:11 (2003)
preBIND
38
Network construction based on literature
Is the network encoded in the DNA?
TFs bind to specific motifs It should be possible to predict TF target genes by reading the DNA
http://fasta.bioch.virginia.edu/cshl/
Experimental approaches
Experimental approaches
Experimental approaches
Experimental approaches
Experimental approaches
Experimental approaches
• Expensive
• Time consuming
• For one research group only feasible for a few TFs
A collection of TFBS can be found in databases: Jasper, Transfac
Motif databases • Jaspar: http://jaspar.genereg.net/ • http://www.gene-regulation.com
/pub/databases.html
To score a single site s for match to a motif W, we use Pr(s |W )
How good is a motif?
• Pr (s | W) is the key idea. However, some statistical mashing is done on this. Consider a genome that is very A/T rich: Pr(A) = 0.45, Pr(T) = 0.45, Pr (C) = 0.05, Pr(G) = 0.05 We saw that Pr (ACACGTT | W) = 0.048 In fact Pr (ACATGTT | W) = 0.048 too.
• Scoring motif matches
• Compute the probability of each site under the above “background model”: Pr (ACACGTT ) = 0.45x0.05x0.45x0.05x0.05x0.45x0.45 =0.0000051. So Pr (ACACGTT | W) = 0.048 is 9364 times Pr (ACACGTT) Similarly, Pr (ACATGTT) is 0.0000461. So Pr (ACATGTT | W) = 0.048 is 1040 times Pr (ACATGTT)
• Pr (ACACGTT | W) is 9364 times Pr (ACACGTT) Pr (ACATGTT | W) is 1040 times Pr (ACATGTT) In other words, if we compare how well “W explains the site” to how well “random background explains it”, then ACACGTT stands out.
How good is a motif?
• The Log Likelyhood Ratio (LLR) score
Given a motif W, background nucleotide frequencies Wb, and a site s,
LLR score of s = log (Pr(s |W) / Pr(s |Wb )
Good scores > 0.
Bad scores < 0.
How good is a motif?
Find motif matches in DNA
Typically people designate the gene closest to the motif as TF’s target
Finding the TF target gene
• So, what to do with the motif now?
We assumed that we have experimental characterization of a TFs binding specificity (the motif) What if we don’t? We can try computational motif discovery
Motif discovery
Motif discovery – Option 1
Try to find the motif given the promoter regions of the five genes G1, G2, … G5
Motif discovery – Option 2
Idea: Find a motif with many (significantly more than expected by chance) matches in the given sequences
Motif discovery – some algorithms
Motif discovery – some tools
Is the network encoded in the DNA?
TFs bind to specific motifs It should be possible to predict TF target genes by reading the DNA Is this really so simple?
• For most TFs is the binding site not known • Since TFBS are degenerated, hard to predict
how efficient the TF really binds • How far away can the binding site be from
the promoter? • Multiple TFs might compete for the same
binding site • Is the nearest gene really the target gene? • Does the binding event have an effect at all? • …
Does the TF binding really have an effect?
• Chromatin immuno-precipitation (ChIP)-Seq
+ - • Overexpression or knock-down of TFs in cell lines, followed by RNA-Seq
Problem: TFs bind at many places
But is indeed a gene regulated by the binding event?
Combine motif finding experiments with experiments changing the TF expression (perturbartion experiments)
Inferring networks from perturbations
Sachs et al. Science. 2005 308:523-9
60
Reverse engineering the topology of regulatory molecular biological networks can be done through the analysis of a set of perturbations. Picture: reversed engineering of the hierarchy of a cell signaling network using multiple perturbations and a statistical method called Bayesian networks inference.
Inferring Networks from Time Series Microarrays
61
Zou M, Conzen SD. Bioinformatics. 2005 21(1):71-9.
Regulatory interactions can also be inferred directly from data = reverse engineering of biological pathways/networks from data. In the example above time-series expression data is used to infer a directed and signed graph based on delayed correlations.
Why are tissues different from each other?
• Summary
Hierarchy Top layer
Kernels Initial TFs
Core layer
Bottom layer Differentiation batteries
Terminal TFs
GRNs are hierarchical
GRNs are hierarchical - Yeast
The model based on experimental evidence in yeast organizes TFs in a stratified nature of three distinct layers: the top, core, and bottom layers. TFs within a layer are highly interconnected and share similar properties. TFs of the different layers regulate distinct sets of targets genes. The three layers are also connected by a central skeleton, a feed-forward structure that utilizes the TFs of the top layer to regulate TFs of the core layer, and TFs of the core layer to regulate TFs of the bottom layer. The core layer is characterized by the highest number of TFs and hubs and is important for signal propagation for the regulation of almost all targets.
Yeast regulatory network of 13385 regulatory interactions among 4503 genes, which includes 158 TFs and 4369 target genes .
Hierarchy
Top layer Kernels
Initial TFs
Core layer
Bottom layer Differentiation batteries
Terminal TFs
GRNs are hierarchical - Development
Hierarchy
Top layer Kernels
Initial TFs
Core layer
Bottom layer Differentiation batteries
Terminal TFs
Developmental biologists have proposed a concept that concentrates on the timely order of events in developmental pathways. In this system modules are classified as kernels, plug-ins, input-output switches and differentiation batteries. Modules can be thought of fulfilling one specific function . Kernels are the initial modules of the network that impact most other parts of the net-work. They are, for instance, involved in the initiation of the development of certain body parts. Differentiation batteries may play a role in terminal steps of the differentiation of body parts and do generally not affect other parts of the network.
e.g. drosophila development
GRNs are hierarchical – cell fate Hierarchy
Top layer Kernels
Initial TFs
Core layer
Bottom layer Differentiation batteries
Terminal TFs
Hobert O PNAS 2008;105:20067-20071
Terminal selector TFs (acting either alone or in synergistic combination) activate downstream target genes directly via terminal selector motifs and also autoregulate their own expression via those motifs. Autoregulated expression of a terminal selector is critical to maintain the differentiated features of the cell. Downstream targets of terminal selectors (X) define differentiated properties of a neuron, such as neurotransmitter receptor, ion channels, adhesion proteins etc. Targets may also include TFs that regulate specific “subroutines.” TFs that are induced by terminal selectors may also cooperate with terminal selector proteins in a feed-forward loop configuration to jointly control specific terminal genes.