computational systems biology: from the generation of testable hypothesis to uncovering organizing...
TRANSCRIPT
Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles
in living systems
Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles
in living systems
NCBI, NLMNational Institutes of Health
NCBI, NLMNational Institutes of Health
M. Madan Babu, PhDM. Madan Babu, PhD
Explosion of information about living systems
Major Challenge – Integration of information
Generate experimentally testable hypothesis Uncover organizing principles of specific processes at the systems level
Expression 5,000 different conditions20 organisms (ArrayExpress, SMD, GEO)
Interaction100,000 interactions
5 organisms (Bind, DIP, publications)
Structure33,000 structures from 300 organisms (PDB, MSD)
Sequence45,000,000 sequences from 160,000 organisms (EBI, NCBI)
Integration of data to uncover general organizing principles
Integration of gene expression data reveals dynamics in transcriptional networks
Integration of data to generate testable hypotheses
Sequence, Structure, Expression and Interaction data provides convincing support
Regulation in Biological Systems
Introduction to transcriptional regulatory networks
Discovery of sequence specific transcription factors in the malarial parasite
Mosquito Human
Mo
squ
ito
Liver
RB
C
www.cdc.gov
Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription factors in Plasmodium
5300 genes with over 700 metabolic enzymes
Extensive complement of chromosomal regulatory proteins
Extensive complement signaling proteins (GTPases, kinases)
Large number of genes Complex life cycle
Genes need to be regulatedGenes need to be regulated
Alternative regulatory mechanisms
Chromatin-level regulationPost-translational modification
RNA based regulation
Undetected transcription factors
Distantly related or unrelatedto known DNA binding domains
Possible explanations for the paradoxical observation
Proteome of Plasmodium
Profiles & HMMs of known DBDs
bZIP
Homeo
MADs
AT-hook
Forkhead
ARID
PF14_0633
+ ?
AT-Hook
SEG
UncharacterizedGlobular domain
~60 aa
Characterization of the globular domain – sequence analysis I
Non-redundant database
+
...
..
..
Lineage specific expansion in Apicomplexa
Plasmodium falciparumPlasmodium vivax
Cryptosporidium parvum
Theileria annulata
Cryptosporidium hominis
Profiles + HMMof this region
Non-redundant database
+
Floral Homeotic protein Q(Triticum)
49L, an endonuclease (X. oryzae phage Xp10)
Globular region maps to AP2 DNA-binding domain
Non-redundant database
+
AP2 DNA-bindingDomain from
D. PsychrophilaDP2593
MAL6P1.287(Plasmodium falciparum)
Cgd6_1140/Chro.60146(Cryptosporidium)
AP2 DNA-binding domain maps to the Globular region
Characterization of the globular domain – sequence analysis II
Multiple sequence alignment of all globular
domains
JPRED/PHD
Sequence of secondary structure is similar to the AP2 DNA-binding domain
Homologs of the conserved globular domain constitutes a novel family of the AP2 DNA-binding domain
S1 S2 S3 H1
Characterization of the globular domain – structural analysis I
A. thaliana ethylene response factor(ATERF1 - 1gcc – NMR structure)
Binds GC rich sequences
S1 S2 S3 H1
S1 S2 S3 H1
Predicted SS of ApiAP2
SS of ATERF1
S1 S2 S3 H1
12 residues show a strong pattern of conservation andthese are involved in key stabilizing hydrophobic
interactions that determine the path of the backbonein the three strands and helix of the AP2 domain
Core fold of the ApiAp2 domainwill be similar to the plantAP2 DNA-binding domain
Characterization of the globular domain – structural analysis II
Y186
R147
K156
T175
R170
W154
R152
R150
E160
W172
W162
G5
C6
C7
G21
G20
G18
G17
R152 --- G5 (oxo group)D/N --- A (amino group)
R150 --- G20 (oxo group)S/T --- A (amino group)
Changes in base-contacting residues suggest binding to
AT-rich sequence
S2 S3
Charged residues in the insertmay contact multiple phosphate
groups to provide affinity
ApiAp2 domain binds DNA in a sequence specific manner
RBC infection & merozoite burst
Characterization of the globular domain – expression analysis I
Mosquito Human
Mo
squ
ito
www.cdc.gov
Complex life cycle
Liver
RB
C
Intra-erythrocyte developmental cycleDeRisi Lab
mRNA expression profilingUsing microarray
(sorbitol syncronization)
Characterization of the globular domain – expression analysis II
Co-expressed genes
Ave
rag
e e
xp
ress
ion
pro
file
of
all
ge
nes
0 46Time points
Ring stage
Trophozoite stage
Early Schizont stage
Schizont stage
22 Transcription factors
0 46Time points
Striking expression pattern in specific developmental stages suggests that they could mediate transcriptional regulation of stage specific genes
Characterization of the globular domain – interaction analysis I
Protein interaction network of P. falciparum
Protein (1267)
Physical interaction (2846)LaCount et. al. Nature (2005)
Modified Y2H: Gal4 DBD + Protein + auxotrophic gene
RNA isolated from mixed stages ofIntra-erythrocyte developmental cycle
Guilt by association
Function of interacting neighbors provides cluesabout function of the protein
Characterization of the globular domain – interaction analysis II
Protein interaction network of P. falciparum
Guilt by association supports the role of ApiAp2 proteins to beinvolved in regulation of gene expression
ApiAp2 proteins (13)
Chromatin proteins (8)
50% hypotheticalNucleosome assemblyHMG proteinGlycolytic enzymesAntigenic proteinsHave a PPint domain
MAL8P1.153 (ES)
PFD0985w (S)
PF10_0075 (T)
PF07_0126 (R)
Network of ApiAp2 proteins (97 interactions, 93 proteins)
Gcn5
Sequence Structure Expression Interaction
Conclusion - I
Integration of different types of experimental data allowed us to discover potential transcription factors
in the Plasmodium genome
Integration of data can generate experimentally testable hypotheses
Balaji S, Madan Babu M, Lakshminarayan Iyer, Aravind LNucleic Acids Research (2005)
Integration of data to uncover general organizing principles
Integration of gene expression data reveals dynamics in transcriptional networks
Integration of data to generate testable hypotheses
Sequence, Structure, Expression and Interaction data provides convincing support
Regulation in Biological Systems
Introduction to biological networks & transcriptional regulatory networks
Discovery of sequence specific transcription factors in the malarial parasite
Networks in Biology
Nodes
Links
Interaction
A
B
Network
Proteins
Physical Interaction
Protein-Protein
A
B
Protein Interaction
Metabolites
Enzymatic conversion
Protein-Metabolite
A
B
Metabolic
Transcription factorTarget genes
TranscriptionalInteraction
Protein-DNA
A
B
Transcriptional
Structure of the transcriptional regulatory network
Scale free network(Global level)
all transcriptionalinteractions in a cellAlbert & Barabasi
Madan Babu M, Luscombe N, Aravind L, Gerstein M & Teichmann SACurrent Opinion in Structural Biology (2004)
Motifs(Local level)
patterns ofInterconnections
Uri Alon & Rick Young
Basic unit(Components)transcriptional
interaction
Transcriptionfactor
Target gene
Properties of transcriptional networks
Local level: Transcriptional networks are made up of motifswhich perform information processing task
Global level: Transcriptional networks are scale-free conferring robustness to the system
Transcriptional networks are made up of motifs
Single inputMotif
- Co-ordinates expression- Enforces order in expression- Quicker response
ArgR
Arg
D
Arg
E
Arg
F
Multiple inputMotif
- Integrates different signals- Quicker response
TrpR TyrR
AroM AroL
Network Motif
“Patterns ofinterconnections
that recur at different parts and
with specificinformation
processing task”
Feed ForwardMotif
- Responds to persistent signal - Filters noise
Crp
AraC AraBAD
Function
Shen-Orr et. al. Nature Genetics (2002) & Lee et. al. Science (2002)
N (k) k
1
Scale-free structure
Presence of few nodes with many links and many
nodes with few links
Transcriptional networks are scale-free
Scale free structure provides robustness to the system
Albert & Barabasi, Rev Mod Phys (2002)
Scale-free networks exhibit robustness
Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly
Tolerant to random removal of nodes (mutations)
Vulnerable to targeted attack of hubs (mutations) – Drug targets
Hubs are crucial components in such networksHaiyuan Yu et. al.
Trends in Genetics (2004)
Summary I - Introduction
Transcriptional networks are made up of motifs that havespecific information processing task
Transcriptional networks are scale-free which confers robustnessto such systems, with hubs assuming importance
Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)
Are there differences in the sub-networks under different conditions?
Cell cycle
Sporulation
Stress
Static network
Across all cellular conditions
Dynamic nature of the regulatory network in yeast
How are the networks used under different conditions?
Dataset - gene regulatory network in Yeast
3,962 genes (142 TFs +3,820 TGs)7,074 Regulatory interactions
Individual experimentsTRANSFAC DB + Kepes dataset
288 genes + 477 genes356 interactions + 906 interactions
ChIp-chip experimentsSnyder lab + Young lab1560 genes + 2416 gene
2124 interactions + 4358 interaction
Integrating gene regulatory network with expression data
142 TFs3,820 TGs7,074 Interactions
Transcription Factors
1 condition
2 conditions
3 conditions
4 conditions
5 conditions
142 TFs1,808 TGs4,066 Interactions
Target Genes
Gene expression data
for 5 cellular conditions
Cell-cycleSporulation
DNA damageDiauxic shift
Stress
Back-tracking method to find active sub-networks
Gene regulatory network
Identify differentially regulated genes
Find TFs that regulate the genes Find TFs that regulate these TFs
Active sub-network
DNA damage
Cell cycleSporulation
Diauxic shift
Stress
Active sub-networks: How different are they ?
Multi-stageprocesses
BinaryProcesses
Network Motifs
Milo et.al (2002), Lee et.al (2002)
Single Input Motif (SIM) – 23%
Feed-forward Motif (FF) – 27%
Multi-Input Motifs (MIM) – 50%
Sub-networks : Network motifs
Network motifs are used preferentially in the different cellular conditions
- Do different proteins become hubs under different conditions?
- Is it the same protein that acts as a regulatory hub?
Cell cycle Sporulation Diauxic shift DNA damage Stress
Condition specific networks are scale-free
Regulatory hubs change with conditions
Cluster TFs according tothe number of target genes
active in each condition
Different TFs become key regulators in
different conditions
CC SP DS DD SR TF
250 45 20 30 15 Swi6
Hubs regulate other hubs to initiate cellular events
Suggests a structure which transfers weight between hubsto trigger cellular events
Network Parameters - Connectivity
Outgoing connections = 49.8
on average, each TF regulates ~50 genes
Changes
Incoming connections = 2.1
on average, each gene is regulated by ~2 TFs
Remains constant
Network parameters : Connectivity
Binary:Quick, large-scale turnover of genes
Multi-stage:Controlled, ticking
over of genes at different stages
• “Binary conditions” greater connectivity
• “Multi-stage conditions” lower connectivity
Number of intermediate TFs until final target
Path length
1 intermediate TF
= 1
Indication of how immediatea regulatory response is
Average path length = 4.7
Network Parameters – Path length
Starting TF
Final target
• “Binary conditions” shorter path-length “faster”, direct action
• “Multi-stage” conditions longer path-length “slower”, indirect action intermediate TFs regulate different stagesBinaryMulti-stage
Network parameters : Path length
Clustering coefficient
= existing links/possible links= 1/6 = 0.17
Measure of inter-connectedness of the network
Average coefficient = 0.11
6 possible links
1 existing link
4 neighbours
Network Parameters – Clustering coefficient
Ratio of existing links to maximum number of links for neighboring nodes
• “Binary conditions” smaller coefficientsless TF-TF inter-regulation
• “Multi-stage conditions” larger coefficients more TF-TF inter-regulation
BinaryMulti-stage
Network parameters : Clustering coeff
Sub-networks have evolved both their local structure and global structure to respond to cellular conditions efficiently
multi-stage conditions
• fewer target genes• longer path lengths• more inter-regulation between TFs
binary conditions
• more target genes• shorter path lengths• less inter-regulation between TFs
Implications
First overview of the dynamics of thetranscriptional regulatory network of a eukaryote
Identification of key regulatory hubs under different conditions can serve asgood drug targets
Provides insights into engineering regulatory interactions
Methods developed to reconstruct and compare active networksare generically applicable
Conclusions - II
Sub-networks have evolved both their local structure and global structure to respond to cellular conditions efficiently
Luscombe N, Madan Babu M et. alNature (2004)
Network motifs are preferentially used under the different cellularconditions and different proteins act as regulatory hubs in different
cellular conditions
Integration of data can uncover organizing principles in living systems