computational systems biology: from the generation of testable hypothesis to uncovering organizing...

43
Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems NCBI, NLM National Institutes of Health M. Madan Babu, PhD

Upload: darrell-garrett

Post on 17-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles

in living systems

Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles

in living systems

NCBI, NLMNational Institutes of Health

NCBI, NLMNational Institutes of Health

M. Madan Babu, PhDM. Madan Babu, PhD

Explosion of information about living systems

Major Challenge – Integration of information

Generate experimentally testable hypothesis Uncover organizing principles of specific processes at the systems level

Expression 5,000 different conditions20 organisms (ArrayExpress, SMD, GEO)

Interaction100,000 interactions

5 organisms (Bind, DIP, publications)

Structure33,000 structures from 300 organisms (PDB, MSD)

Sequence45,000,000 sequences from 160,000 organisms (EBI, NCBI)

Integration of data to uncover general organizing principles

Integration of gene expression data reveals dynamics in transcriptional networks

Integration of data to generate testable hypotheses

Sequence, Structure, Expression and Interaction data provides convincing support

Regulation in Biological Systems

Introduction to transcriptional regulatory networks

Discovery of sequence specific transcription factors in the malarial parasite

Mosquito Human

Mo

squ

ito

Liver

RB

C

www.cdc.gov

Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription factors in Plasmodium

5300 genes with over 700 metabolic enzymes

Extensive complement of chromosomal regulatory proteins

Extensive complement signaling proteins (GTPases, kinases)

Large number of genes Complex life cycle

Genes need to be regulatedGenes need to be regulated

Alternative regulatory mechanisms

Chromatin-level regulationPost-translational modification

RNA based regulation

Undetected transcription factors

Distantly related or unrelatedto known DNA binding domains

Possible explanations for the paradoxical observation

Proteome of Plasmodium

Profiles & HMMs of known DBDs

bZIP

Homeo

MADs

AT-hook

Forkhead

ARID

PF14_0633

+ ?

AT-Hook

SEG

UncharacterizedGlobular domain

~60 aa

Characterization of the globular domain – sequence analysis I

Non-redundant database

+

...

..

..

Lineage specific expansion in Apicomplexa

Plasmodium falciparumPlasmodium vivax

Cryptosporidium parvum

Theileria annulata

Cryptosporidium hominis

Profiles + HMMof this region

Non-redundant database

+

Floral Homeotic protein Q(Triticum)

49L, an endonuclease (X. oryzae phage Xp10)

Globular region maps to AP2 DNA-binding domain

Non-redundant database

+

AP2 DNA-bindingDomain from

D. PsychrophilaDP2593

MAL6P1.287(Plasmodium falciparum)

Cgd6_1140/Chro.60146(Cryptosporidium)

AP2 DNA-binding domain maps to the Globular region

Characterization of the globular domain – sequence analysis II

Multiple sequence alignment of all globular

domains

JPRED/PHD

Sequence of secondary structure is similar to the AP2 DNA-binding domain

Homologs of the conserved globular domain constitutes a novel family of the AP2 DNA-binding domain

S1 S2 S3 H1

Characterization of the globular domain – structural analysis I

A. thaliana ethylene response factor(ATERF1 - 1gcc – NMR structure)

Binds GC rich sequences

S1 S2 S3 H1

S1 S2 S3 H1

Predicted SS of ApiAP2

SS of ATERF1

S1 S2 S3 H1

12 residues show a strong pattern of conservation andthese are involved in key stabilizing hydrophobic

interactions that determine the path of the backbonein the three strands and helix of the AP2 domain

Core fold of the ApiAp2 domainwill be similar to the plantAP2 DNA-binding domain

Characterization of the globular domain – structural analysis II

Y186

R147

K156

T175

R170

W154

R152

R150

E160

W172

W162

G5

C6

C7

G21

G20

G18

G17

R152 --- G5 (oxo group)D/N --- A (amino group)

R150 --- G20 (oxo group)S/T --- A (amino group)

Changes in base-contacting residues suggest binding to

AT-rich sequence

S2 S3

Charged residues in the insertmay contact multiple phosphate

groups to provide affinity

ApiAp2 domain binds DNA in a sequence specific manner

RBC infection & merozoite burst

Characterization of the globular domain – expression analysis I

Mosquito Human

Mo

squ

ito

www.cdc.gov

Complex life cycle

Liver

RB

C

Intra-erythrocyte developmental cycleDeRisi Lab

mRNA expression profilingUsing microarray

(sorbitol syncronization)

Characterization of the globular domain – expression analysis II

Co-expressed genes

Ave

rag

e e

xp

ress

ion

pro

file

of

all

ge

nes

0 46Time points

Ring stage

Trophozoite stage

Early Schizont stage

Schizont stage

22 Transcription factors

0 46Time points

Striking expression pattern in specific developmental stages suggests that they could mediate transcriptional regulation of stage specific genes

Characterization of the globular domain – interaction analysis I

Protein interaction network of P. falciparum

Protein (1267)

Physical interaction (2846)LaCount et. al. Nature (2005)

Modified Y2H: Gal4 DBD + Protein + auxotrophic gene

RNA isolated from mixed stages ofIntra-erythrocyte developmental cycle

Guilt by association

Function of interacting neighbors provides cluesabout function of the protein

Characterization of the globular domain – interaction analysis II

Protein interaction network of P. falciparum

Guilt by association supports the role of ApiAp2 proteins to beinvolved in regulation of gene expression

ApiAp2 proteins (13)

Chromatin proteins (8)

50% hypotheticalNucleosome assemblyHMG proteinGlycolytic enzymesAntigenic proteinsHave a PPint domain

MAL8P1.153 (ES)

PFD0985w (S)

PF10_0075 (T)

PF07_0126 (R)

Network of ApiAp2 proteins (97 interactions, 93 proteins)

Gcn5

Sequence Structure Expression Interaction

Conclusion - I

Integration of different types of experimental data allowed us to discover potential transcription factors

in the Plasmodium genome

Integration of data can generate experimentally testable hypotheses

Balaji S, Madan Babu M, Lakshminarayan Iyer, Aravind LNucleic Acids Research (2005)

Integration of data to uncover general organizing principles

Integration of gene expression data reveals dynamics in transcriptional networks

Integration of data to generate testable hypotheses

Sequence, Structure, Expression and Interaction data provides convincing support

Regulation in Biological Systems

Introduction to biological networks & transcriptional regulatory networks

Discovery of sequence specific transcription factors in the malarial parasite

Networks in Biology

Nodes

Links

Interaction

A

B

Network

Proteins

Physical Interaction

Protein-Protein

A

B

Protein Interaction

Metabolites

Enzymatic conversion

Protein-Metabolite

A

B

Metabolic

Transcription factorTarget genes

TranscriptionalInteraction

Protein-DNA

A

B

Transcriptional

Structure of the transcriptional regulatory network

Scale free network(Global level)

all transcriptionalinteractions in a cellAlbert & Barabasi

Madan Babu M, Luscombe N, Aravind L, Gerstein M & Teichmann SACurrent Opinion in Structural Biology (2004)

Motifs(Local level)

patterns ofInterconnections

Uri Alon & Rick Young

Basic unit(Components)transcriptional

interaction

Transcriptionfactor

Target gene

Properties of transcriptional networks

Local level: Transcriptional networks are made up of motifswhich perform information processing task

Global level: Transcriptional networks are scale-free conferring robustness to the system

Transcriptional networks are made up of motifs

Single inputMotif

- Co-ordinates expression- Enforces order in expression- Quicker response

ArgR

Arg

D

Arg

E

Arg

F

Multiple inputMotif

- Integrates different signals- Quicker response

TrpR TyrR

AroM AroL

Network Motif

“Patterns ofinterconnections

that recur at different parts and

with specificinformation

processing task”

Feed ForwardMotif

- Responds to persistent signal - Filters noise

Crp

AraC AraBAD

Function

Shen-Orr et. al. Nature Genetics (2002) & Lee et. al. Science (2002)

N (k) k

1

Scale-free structure

Presence of few nodes with many links and many

nodes with few links

Transcriptional networks are scale-free

Scale free structure provides robustness to the system

Albert & Barabasi, Rev Mod Phys (2002)

Scale-free networks exhibit robustness

Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly

Tolerant to random removal of nodes (mutations)

Vulnerable to targeted attack of hubs (mutations) – Drug targets

Hubs are crucial components in such networksHaiyuan Yu et. al.

Trends in Genetics (2004)

Summary I - Introduction

Transcriptional networks are made up of motifs that havespecific information processing task

Transcriptional networks are scale-free which confers robustnessto such systems, with hubs assuming importance

Madan Babu M, Luscombe N et. alCurrent Opinion in Structural Biology (2004)

Are there differences in the sub-networks under different conditions?

Cell cycle

Sporulation

Stress

Static network

Across all cellular conditions

Dynamic nature of the regulatory network in yeast

How are the networks used under different conditions?

Dataset - gene regulatory network in Yeast

3,962 genes (142 TFs +3,820 TGs)7,074 Regulatory interactions

Individual experimentsTRANSFAC DB + Kepes dataset

288 genes + 477 genes356 interactions + 906 interactions

ChIp-chip experimentsSnyder lab + Young lab1560 genes + 2416 gene

2124 interactions + 4358 interaction

Integrating gene regulatory network with expression data

142 TFs3,820 TGs7,074 Interactions

Transcription Factors

1 condition

2 conditions

3 conditions

4 conditions

5 conditions

142 TFs1,808 TGs4,066 Interactions

Target Genes

Gene expression data

for 5 cellular conditions

Cell-cycleSporulation

DNA damageDiauxic shift

Stress

Back-tracking method to find active sub-networks

Gene regulatory network

Identify differentially regulated genes

Find TFs that regulate the genes Find TFs that regulate these TFs

Active sub-network

DNA damage

Cell cycleSporulation

Diauxic shift

Stress

Active sub-networks: How different are they ?

Multi-stageprocesses

BinaryProcesses

Network Motifs

Milo et.al (2002), Lee et.al (2002)

Single Input Motif (SIM) – 23%

Feed-forward Motif (FF) – 27%

Multi-Input Motifs (MIM) – 50%

Sub-networks : Network motifs

Network motifs are used preferentially in the different cellular conditions

- Do different proteins become hubs under different conditions?

- Is it the same protein that acts as a regulatory hub?

Cell cycle Sporulation Diauxic shift DNA damage Stress

Condition specific networks are scale-free

Regulatory hubs change with conditions

Cluster TFs according tothe number of target genes

active in each condition

Different TFs become key regulators in

different conditions

CC SP DS DD SR TF

250 45 20 30 15 Swi6

Hubs regulate other hubs to initiate cellular events

Suggests a structure which transfers weight between hubsto trigger cellular events

Network Parameters

Connectivity

Path length

Clustering coefficient

Network Parameters - Connectivity

Outgoing connections = 49.8

on average, each TF regulates ~50 genes

Changes

Incoming connections = 2.1

on average, each gene is regulated by ~2 TFs

Remains constant

Network parameters : Connectivity

Binary:Quick, large-scale turnover of genes

Multi-stage:Controlled, ticking

over of genes at different stages

• “Binary conditions” greater connectivity

• “Multi-stage conditions” lower connectivity

Number of intermediate TFs until final target

Path length

1 intermediate TF

= 1

Indication of how immediatea regulatory response is

Average path length = 4.7

Network Parameters – Path length

Starting TF

Final target

• “Binary conditions” shorter path-length “faster”, direct action

• “Multi-stage” conditions longer path-length “slower”, indirect action intermediate TFs regulate different stagesBinaryMulti-stage

Network parameters : Path length

Clustering coefficient

= existing links/possible links= 1/6 = 0.17

Measure of inter-connectedness of the network

Average coefficient = 0.11

6 possible links

1 existing link

4 neighbours

Network Parameters – Clustering coefficient

Ratio of existing links to maximum number of links for neighboring nodes

• “Binary conditions” smaller coefficientsless TF-TF inter-regulation

• “Multi-stage conditions” larger coefficients more TF-TF inter-regulation

BinaryMulti-stage

Network parameters : Clustering coeff

Sub-networks have evolved both their local structure and global structure to respond to cellular conditions efficiently

multi-stage conditions

• fewer target genes• longer path lengths• more inter-regulation between TFs

binary conditions

• more target genes• shorter path lengths• less inter-regulation between TFs

Implications

First overview of the dynamics of thetranscriptional regulatory network of a eukaryote

Identification of key regulatory hubs under different conditions can serve asgood drug targets

Provides insights into engineering regulatory interactions

Methods developed to reconstruct and compare active networksare generically applicable

Conclusions - II

Sub-networks have evolved both their local structure and global structure to respond to cellular conditions efficiently

Luscombe N, Madan Babu M et. alNature (2004)

Network motifs are preferentially used under the different cellularconditions and different proteins act as regulatory hubs in different

cellular conditions

Integration of data can uncover organizing principles in living systems

Balaji Lakshminarayan Aravind

Acknowledgements

National Center for Biotechnology InformationNational Institutes of Health

MRC Laboratory of Molecular Biology

Nick LuscombeSarah Teichmann

Haiyuan YuMike Snyder

Mark Gerstein