systems biology for drug discovery
DESCRIPTION
Systems Biology for Drug Discovery. Building and using protein interaction networks: industry perspective. Andrej Bugrim GeneGo, Inc. Topics. Annotation process and collecting network content for idustrial-type applications - PowerPoint PPT PresentationTRANSCRIPT
Copyright GeneGo 2000-2003
CONFIDENTIAL
Systems Biology for Drug Discovery
Building and using protein interaction networks: industry perspective
Andrej BugrimGeneGo, Inc.
Copyright GeneGo 2000-2003
CONFIDENTIALTopics
• Annotation process and collecting network content for idustrial-type applications
• Biological and disease ontologies – how to improve and use them in functional analysis
• Tools: utilizing network data in pharmaceutical R&D
Copyright GeneGo 2000-2003
CONFIDENTIALMulti-level understanding of human biology
Level of
phenotype
Level ofCell process/
network
Level of protein
Causativerelations
Mechanisticrelations
Copyright GeneGo 2000-2003
CONFIDENTIAL
BC-perturbed cell processesCausative BC models
Disease-centered knowledge base in MetaMiner (Oncology example)
General BC schema
Other cancers chosen by Consortium
Compare
Causative disease associations:DNA, RNA, protein levels
Disease group
Protein-protein; Protein-DNA; protein-RNA interactions
Network group
Ligand-receptorinteractions: drugs,
leads, hits
Chemistry group
Biomarkers
Specialty group
GG annotation team
Copyright GeneGo 2000-2003
CONFIDENTIAL
Content
Copyright GeneGo 2000-2003
CONFIDENTIAL
Ligands: metabolites, peptides, xenoboitics
Membrane receptors
Signal transduction:G proteins,
Secondary messengersKinases
Phosphotases
Transcription factors
Core effect: metabolic pathways
Metabolites
•1,600 drugs w/targets• 4,100 endogenous metabolites•>21,000 ligand-receptor interactions•850 GPCRs and other membrane receptors•110 Nuclear hormone receptors
Three interactions domains in MetaCore
172K manually curated physical signaling interactions538 canonical maps
42,000 13-step canonical signal transduction pathways
924 Human transcription factors6,000 target genes
11,300 metabolic reactions
116 Fine metabolic maps
4,100 endogenous metabolites
Copyright GeneGo 2000-2003
CONFIDENTIALMetaBase Content Overview
– Database• Chemical compounds 580,000• Drugs 8,590• Chemical Reactions 35,600• Metabolic networks 251
– Network• Proteins + genes 13,402• Transcription factors 924• Chemical compounds 26,000• Drugs 2,740• Endogenous compounds 4,100• Proteins linked to drugs 2,711• Reactions 5,330• Small molecule ligands for
human receptors 3,510• blockers for ion channels 629• Pubmed journals 3,100• Pubmed articles 81,400• Total amount of interactions 177,000
– Content• GeneGo regulatory networks 120• GeneGo disease networks 88• Maps 538• Regulatory maps 325• Metabolic maps 116• Traditional metabolic maps (EC) 97• Diseases 4,920
Copyright GeneGo 2000-2003
CONFIDENTIAL
4,100
8,590
3,422
15,700
25,662
27,418
3,580
Endogenous compounds
Drugs
Drug metabolites
Compounds in reacts
Compounds in network
Compounds with structures
Reaction substrates withkinetic data
MetaBase content by type
Database
Genes (human: 38,700)
Total:137,500
Chemical compounds
580,000
Human proteins 14,570
Metabolicreactions
35,600
Copyright GeneGo 2000-2003
CONFIDENTIAL
Protein-protein interactions
Influence on expression; 10,120; 14%
Regulation of transcription;
15,725; 21%
Unspecified regulation; 3,990; 5%
Covalent modification;
5,967; 8%
Activation/ inhibition via
binding; 43,079; 52%
Network interactions
All interactions taken from articles indexed in Pubmed
Pubmed journals 3,100
Pubmed articles 81,400
Small molecule-protein
Regulation of transporters;
5,786; 14%
Binding to kinazes; 6,984;
16%
Regulation of enzymes; 8,898;
21%
Regulation of other proteins;
6,218; 15%
Binding to receptors; 14,497; 34%
Manually curated interactions (172,787)
Signalling interactions; 137,297; 79%
Metabolic reactions; 35,490; 21%
Y2H "Interactome"; 2,370; 1%
Logical relations; 1,934; 1%
Protein-protein; 87,675; 51%
Small molecule-protein; 42,383; 26%
With MicroRNA; 1,620; 1%
With virus protiens; 335; 0%Chip-Chip; 980; 1%
Copyright GeneGo 2000-2003
CONFIDENTIALType of interactions in network
Effects
activation
inhibition
unspecified
Direct interactionIndirect interaction
Mechanism Mechanism
phosphorylation influence on expression
dephosphorylation unspecified
other type of covalent modification
binding
transport
cleavage
transcription regulation
transformation
catalysis
competition
Copyright GeneGo 2000-2003
CONFIDENTIALDistribution of interactions by mechanism
competition0.1%
catalysis8%
transformation1%
transcription regulation
15%
dephospho-rylation
0.5%
phospho-rylation
4.1%
unspecified6.4%
influence on expression
12%
binding48%
covalent modification
1%
cleavage2%
transport2%
Copyright GeneGo 2000-2003
CONFIDENTIALNetwork objects
Network objects
Metabolic reactions; 5,353
Metabolic reactions; 5,353
Proteins; 13,406
Chemical compounds ; 25,662
Xenobiotic compounds; 15,955
Drug metabolites; 1,032Drugs; 2,741
Endogenous compounds; 4,010
Metabolites of xenobiotics; 1,924
Enzymes; 2,910
Kinazes; 626
Phosphatases; 137
Proteases; 352
Transcription factors; 924
membrane receptors; 764
Nuclear hormone receptors; 110
Receptor Ligands; 640
Transporters; 804
Ion Channels; 217Other; 5,922
Total number of nodes: 40,229
Copyright GeneGo 2000-2003
CONFIDENTIALProteins: distribution by tissue & localization
Proteins: distribution by tissue
7181
7365
7236
7064
7247
6888
7485
7430
7150
7263
5715
6961
7377
8376
6241
7788
7655
7064
7803
6761
7427
7758
7471
7484
4452
76950
10
00
20
00
30
00
40
00
50
00
60
00
70
00
80
00
90
00
10
00
0
Adrenal Gland
Brain
Colon
Heart
Kidney
Liver
Lung
Mammary Gland
Marrow
Ovary
Pancreas
Placenta
Prostate
Retina
Salivary Gland
Skin
Spinal Cord
Spleen
Testes
Thymus
Thyroid
Tonsil
Trachea
Upper GI Tract
Uteri
Common for all these tissuesProteins: distribution by cell compartment
42
44
48
54
56
56
91
94
100
126
147
178
226
249
335
399
530
684
823
18107
1 10 100 1,000 10,000 100,000
lysosome
actin cytoskeleton
cytoskeleton
proteinaceous extracellular matrix
Golgi apparatus
intracellular
endoplasmic reticulum
cytosol
membrane
soluble fraction
mitochondrion
extracellular space
membrane fraction
extracellular region
integral to membrane
plasma membrane
cytoplasm
integral to plasma membrane
nucleus
Unspecified
Copyright GeneGo 2000-2003
CONFIDENTIALMolecular functions in Database
catalytic activity; 4086; 23%
binding; 8503; 46%
signal transducer activity; 2535; 13%
transcription regulator activity;
1396; 7%
transporter activity; 1078; 6%
enzyme regulator activity; 599; 3%
chemorepellant activity; 3; 0%
chemoattractant activity; 8; 0%
structural molecule activity; 459; 2%
chaperone regulator activity; 11; 0%
translation regulator activity; 75; 0%
motor activity; 77; 0%
antioxidant activity; 51; 0%
Copyright GeneGo 2000-2003
CONFIDENTIALEndogenous compounds (4,100 total)
Endogenous compounds by origin Steroids 4% Fatty Acids
5%
Lipids43%
Peptides10%
Other19%
Carbohydrates15%
Vitamins/Co-factors
6%
Nucleotides2%
•3,070 endogenous compounds involved in metabolic reactions: 6,819 reactions with endogenous compounds only•751 endogenous ligand for 498 receptors with 2,455 interactions•4000 (98%) of endogenous compounds in network•15,962 network interactions with endogenous metabolites•3,600 compounds with structures and brutto-formulas (other 700 are “generic”: contain acyl-, alkyl- and other variable groups)
Copyright GeneGo 2000-2003
CONFIDENTIALNetwork and pathway statistics in GeneGO
• >40,000 nodes;• ~177,000 edges;• Average node degree: 3,77;• 241 million shortest pathways;• Average shortest pathway length: 5.3811;
• 42,000 13-step canonical signal transduction pathways; • 200 canonical metabolic pathways- major metabolic
fluxes like glycolysis or TCA;• 72,000 pathways on metabolic maps: pathways
analogous to KEGG (KEGG has 42,500)
Enzyme1 Enzyme2reaction1 reaction2metabolite
Copyright GeneGo 2000-2003
CONFIDENTIALPathways in regulatory network
B
B
C
Tr
2
3
A
kinase
1Tr+P
ZB
B
D
Tr
B
Tr
B
kinase
B
+P
B
kinase
B+P
a
ab
Start: TMR (transmembrane receptor) TF (Transcription Factor)
End: Target genes
Copyright GeneGo 2000-2003
CONFIDENTIAL
Ontologies
Copyright GeneGo 2000-2003
CONFIDENTIAL
Mixed ontologiesMixed ontologies
Knowledge base (ontologies)
By genre:- Drama- Action- Romance- Horror- Foreign
By director:- Lynch-Tarantino- Leone- Stone- Antonioni
By actor:- Pitt- Nicholson- Depp- Redford- Damon
By year:-2007-2006-2005-2004-2003
• How do you compare “action” movies vs. Tarantino movies vs. 2003 movies?•These are incomparable as these are different categories
Molecular pthwyCellular processDiseaseMetabolic process
Copyright GeneGo 2000-2003
CONFIDENTIALMultiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human biology
Copyright GeneGo 2000-2003
CONFIDENTIALEnrichment in GO and GeneGo processes
•4 samples from 4 patiens•Disease/norm from same patients•Affy U133A arrays
GO processes GeneGo process networks
• Resolution: list of proteins• No connections between proteins• No sgnaling/effect within process
• Resolution: interactions between proteins• Connections between all proteins in folder• Clear signaling path, effect within process
Copyright GeneGo 2000-2003
CONFIDENTIAL
Genes from GO process“Inflammatory response”
231
Genes from GO-process“Immune response”
446Genes from GO-processes“Inflammatory response”
“Immune response”613
Not in networks268
Genes in 15 process networks1642
Genes added to networks1297
In networks345
Not in networks
79Not in networks
199
In networks
247In networks
152
Inflammation
Copyright GeneGo 2000-2003
CONFIDENTIAL
34%
66%
17%
83%
Diseases
Human genes linked to diseases
– 6,318
Human genes not linked to diseases –
32,391 Diseases with no gene links – 3,251
Diseases linked to genes – 1,630
6,318 genes are linked to 1,630 diseases
4,881 Diseases, based on MeSH 38,709 Human genes total
21,264 unique articles, indexed in PubMed
Copyright GeneGo 2000-2003
CONFIDENTIALDisease tree – Neoplasms by Site
Copyright GeneGo 2000-2003
CONFIDENTIALDrug toxicity tree
Folders from MeSH Folders created at GeneGo based on reviews
38 Drug-induced pathological processes
Copyright GeneGo 2000-2003
CONFIDENTIALGene-Disease connections in public domain and GeneGo
GENE MeSH
•Hierarchical strusturedisease classification 4,888 diseases•Genes associated with diseases 6,429•Cited articles 33, 792
Public domain does not have structured information about disease connectivity(by clinical classification) and causative relations withgenes and proteins
Only citation with Diseases name. Low trust
Only hierarchical structure
disease tree
OMIM
Only genetic info (mutation, SNPs)-No expression- No protein activity, loc
GeneGo
Copyright GeneGo 2000-2003
CONFIDENTIALContent. Cancer maps and networks. Breast Cancer: general scheme
Copyright GeneGo 2000-2003
CONFIDENTIALAngiogenesis in tumor growth
Copyright GeneGo 2000-2003
CONFIDENTIAL
Unique genes
HumanMouse, Rat
141 mouse genes
74 rat genes
9 mouse genes2 rat genes
1 mouse gene1 rat gene
Unique genes and orthologs catalyse one reaction
Unique genes catalyze unique reactions
There is no human orthologs for Protein A
Orthologs catalyse different reactions
Fine metabolic differences between rodents, human
Copyright GeneGo 2000-2003
CONFIDENTIAL
Tools
Copyright GeneGo 2000-2003
CONFIDENTIALData analysis workflow in MetaDiscovery suit
HTS, HCS
PathwayEditorMetaLink MapEditor
Custom interactions data:-Y2H-Pull-down-Co-expression- annotation
Custom maps,networks, pathways
MetaCore/MetaDrug platform
Med. chemistry:- Indications- Toxicities- Off-site effects
Modeling software:-CellDesigner- Virtual Cell
-
SBML, BioPax
Biology:- Biomarkers- Pathway-based targets
Structuressdf, MOL
Molecular bio data HTS, HCSMetabolitesISIS DB
Signature networks-Diseases-Drug response
P-value scoringOntologies:-GO processes-GeneGo processes-Canonical pathways-Metabolic networks-Diseases-Toxicities
Cross-experiment comparison-Time series- Multi-patient cohorts- Multiple logical operations-Complete report
Network alignment- Multiple algorithms- Sub-network queries
Copyright GeneGo 2000-2003
CONFIDENTIALMetaCore™ Platform
Networks
Building Tools
Visualization
Tools
Oracle Based Database
curated interactions from the literature
Data:m-arrays, SAGE, proteomics,siRNA, metabolites, custom
interactionsLogical operations module
Pathway editor Statistics for pathways, processes, networks
Copyright GeneGo 2000-2003
CONFIDENTIAL
Networks of protein interactions
– Dynamic; built “on-the-fly”
– Exploratory tool
– Build new pathways for genes of interest
Pathways Integration
Interactive, static maps
– 550 maps
– Signaling, regulation, metabolism, diseases
– Backbone of formalized “state of art” in the field
Copyright GeneGo 2000-2003
CONFIDENTIALChoose direction and checkpoints within network building page
From – histaminethrough – histamine H1 receptorto – Actin
Copyright GeneGo 2000-2003
CONFIDENTIAL
Non-significant bars become semi-transparent
False discovery rate filter
0.01 ApplyApply iiThreshold
Copyright GeneGo 2000-2003
CONFIDENTIALNew customization modules
• MapEditor: custom maps synchronized with MC/MD database– Draw pathways maps from scratch– Transform gene lists into networks into pathway maps– Edit MetaCore’s canonical maps– View and score your maps within the context of canonical maps– Map experimental data on custom maps
• MetaLink: overlaying custom interactions– Import custom interactions (Y2H, co-expression, pull-down, etc.)– Visualize using GeneGo network building algorithms– Score “unknown” proteins (high IP potential) based on relevance to
“benchmark” networks built from MetaCore interactions
• PathwayEditor: annotation technology transfer, at the database level– Custom annotation of interactions, compounds, diseases, metabolism in the
framework of internal annotation system at GeneGo– Use the annotation forms, workflows and QC system developed at GeneGo– Novel objects are imported and integrated with pre-existing data in MetaCore
Copyright GeneGo 2000-2003
CONFIDENTIALAdding Localizations
Additional Localizations can be added
Copyright GeneGo 2000-2003
CONFIDENTIALYour NEW map is now an interactive part of MetaCore
Users can visualize
their experimental data on the new map
Copyright GeneGo 2000-2003
CONFIDENTIAL
Resulting Direct Interactions network
Pink interactions are from the uploaded links file Mouse over an
interaction to see the uploaded weight value
Blue interactions are in both the links file and the MetaCore database
Mapping interaction sets on networks
Copyright GeneGo 2000-2003
CONFIDENTIAL
Algorithms
Copyright GeneGo 2000-2003
CONFIDENTIALOld and new ways to analyze data
Full data tables
Statistical procedures,
thresholds of fold, p-value either in MC
or 3rd party tools
Sets of genes
Connect them on network by one way or another:
Too many choices, no clear way to choose
Full data tables
Statistical procedures in MC
based on concurrent analysis
of expression profiles and connectivity
Sets of network modules
Apply to global network
Current way of analysis: all significance calculations done before mapping onto network
New way of analysis: significance calculations follow the mapping onto network
Copyright GeneGo 2000-2003
CONFIDENTIALSamples are analyzed in pathway’s expression space
Sample 1 Sample 2 Sample 3 Sample 4
Gene 1 1 4 3 2
Gene 2 4 2 7 6
Gene 3 2 9 3 8
Gene 4 2 5 4 2
Copyright GeneGo 2000-2003
CONFIDENTIALNetwork signatures for compounds effects
Mestranol
Tamoxifen
Phenobarbital
Phenobarbital
Copyright GeneGo 2000-2003
CONFIDENTIALFinding topologically significant nodes
A
B C
Topologically significantTopologically significant Not topologically significant
4 out 6 under nodes regulated by B are differentially expressed: more than random
share = significant
Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event
= not significant
In reality algorithm also considers nodes beyond first-degree neighbors
Differentially expressed genes Non-differentially expressed genes
Copyright GeneGo 2000-2003
CONFIDENTIALWhy JAK1 is significant in this dataset?
Regulation via JAK1
JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1
Topological significance helps to find important links in pathways that do not come up on HT screens
Feedback loops
Copyright GeneGo 2000-2003
CONFIDENTIALRegulation of lipid Metabolism
Differentially expressed genes identified by microarray and confirmed by proteomic screen
Topologically significant nodes revealed by the new algorithm
Copyright GeneGo 2000-2003
CONFIDENTIALPutting it all together: network activity inference
– Identifying causal relation between putative input and output signals– Tracking effects of molecular perturbation trough activation/inhibition
cascades
Z Z Z
Experimental data: start cascade
Experimental data: terminate cascade
Inferred activity
Experimental data
Predicted input
Predicted target
Scoring intermediary nodes
Copyright GeneGo 2000-2003
CONFIDENTIALWork in progress
• Finding Patterns of significance (based on one experiment): – Significant neighborhoods– Significant receptors (by underlying cascade)– Significant transcription factors (by upstream cascade)– Significant interaction types (by distribution of expression at terminals)
• Finding common and different pathway modules (based on multiple samples:– Looking for “differential pathways” - modules that distinguish one group of
samples from another– Finding common motifs in a group of pathway modules
• Inferring patterns of network activity– Identifying causal relation between putative input and output signals– Tracking effects of molecular perturbation trough activation/inhibition
cascades • Looking into mutual gene-process information and Bayesian inference of
significance– If gene G occurs only in process P its up-/down-regulation is a significant
evidence with respect to inferring P’s status– If gene G occurs in many other processes in addition to P its up-/down-
regulation is not a significant evidence with respect of inferring P’s status
Copyright GeneGo 2000-2003
CONFIDENTIAL
Future products
Copyright GeneGo 2000-2003
CONFIDENTIALMetaMiner Consortiums for 2007
• Oncology (breast cancer, 4 other cancers)
• Metabolic diseases (diabetes II, obesity, metabolic syndrome)
• CNS and neurodegenerative diseases
• Immunological and autoimmune diseases
Copyright GeneGo 2000-2003
CONFIDENTIALMetaMiner consortiums: Analytical platform for disease areas
HTS, HCS
MetaMiner (Oncology) platform
Cancer relevant annotations, datatabases,Active cpds analysis creening
• Maps for disease, processes, drug action• Custom maps for projectsExperimental data depository
Data parsing, normalization
Data analysis
Cancer consortium labs
Compounds scoring:- Indications- Toxicities- Off-site effects
Drug targets:-Divergence hubs on networks; - “Druggability” testing- Pathways connectivity
Biomarkers:-Combination of different types - Expression - Secreted proteins - Metabolites-Convergence hubs (core effectors)
Copyright GeneGo 2000-2003
CONFIDENTIALMetaTox consortium. Functional descriptors
Enrichment by category Pathways maps Toxicity, process maps Sub-networks, modules, nodes
Mapping on descriptors
Indexing & scoring by tox. category
Predictive models