metacore/metadrug portal as an enterprise systems …...present a structure as a group of proteins...
TRANSCRIPT
Copyright GeneGo 2000-2007
Cover Slide
MetaCore/MetaDrug portal as an enterprise systems solution for multi-site distributed OMICs research
Copyright GeneGo 2000-2007
Agenda and Evaluation criteria
• Introductions• Introduction to GeneGo and GeneGo products
– Profitable, growing company, privately owned, no VC– Strong science
• Science and Cancer cell papers• $10MM in grants• Research services
• Ability to manipulate data• Data quality/content• Data Management
– Data input: batch uploading, automating through API, from third party tools and data manager
• Data export– Output: Excel, third party, EndNote, 300 dpi images, easy sharing
• Ability to influence future developments and integration of their pipelines, content and workflows
• On line help and tutorials
Copyright GeneGo 2000-2007
• Founded in 2000 by Dr Tatiana Nikolskaya
• 100+ employees and in hiring mode
• Products
– MetaCore™, MetaRodent™, MetaBase™, MapEditor™, MetaLink™
– MetaDrug™, MetaKinase™
– 1-2-3 Workflow™
– iPath
• Consortiums
– MetaTox™ Consortium (FDA, Vertex, Elan)
– MetaMiner™ Consortiums; Oncology, Metabolic, Cardiac, Immunology and CNS diseases
• Offices, expanding San Diego office
– St Joseph, San Diego, San Francisco, London, Moscow, Taipei, Tokyo
• Current grants and awards– 2005 $100K NIEHS grant SBIR Phase I (toxicogenomics)– 2005 $100K NIH grant SBIR Phase I (biomarkers)– 2005 subcontract for University of Michigan grant, Gil Ommen (proteomics)– 2005-06 $750K NIGMS SBIR Phase II (systems pharmacology)– 2006-07 $750K DOD/DARPA SBIR Phase II (proteomics)– 2006-07 $1 million NCI SBIR Phase II (breast cancer)– 2007 $1 million Toxicogenomics Phase II
• Publications– GeneGo peer-review publications
• Cancer Cell Paper on BC stem cells• Science, Johns Hopkins• FDA paper Nature Biotechnology
– Customer publications, presentations and postersCopyright GeneGo 2000-2005
GeneGo Overview
Copyright GeneGo 2000-2007
GeneGo Academic & Regulatory Customers
“The manually curated content available from GeneGo is exceptional and will allow us to advance our projects,” said Dr Nat Goodman, a Senior Research Scientist at ISB. “We are initiating several research projects together that will hopefully result in publications.”
Copyright GeneGo 2000-2007
Collaborations
Genespring
Copyright GeneGo 2000-2007
GeneGo Environment
Enterprise solution
Relational Database45 PhD’s and 5 M.D.’s
Databases
GVKPubmedOMIM
UnigeneABI Body Atlas
Swiss ProtGeneProtInternal
VisualizationMaps
Networks
ContentInteraction
GenesProteins
MetabolitesXenobioticsPathwaysOntologies
ToolsNetworksPathways
EnrichmentData
depositoryData
comparisonPrioritization
ExperimentData
Gene expressionProteomics
MetabolomicsSNPsSAGESiRNA
CHiP ChipHCS
StatisticsAlgorithms Custom interactions
North America-Europe-Asia
Research-Development-Clinical TrialsWeb Client
Server
Copyright GeneGo 2000-2007
Ligands: metabolites, peptides, xenoboitics
Membrane receptors
Signal transduction:G proteins,
Secondary messengersKinases
Phosphotases
Transcription factors
Core effect: metabolic pathways
Metabolites
•6,600 drugs w/targets• 4,000 endogenous metabolites•>20,000 ligand-receptor interactions•850 GPCRs and other membrane receptors• Nuclear hormone receptors
Three interactions domains in MetaCore
>147K manually curated physical signaling interactions550 canonical maps
43,000 12-step canonical pathways
900 Human transcription factors4100 target genes
11,000 metabolic reactions
110 Fine metabolic maps-Unique!
4,000 endogenous metabolites
Copyright GeneGo 2000-2007
Unique: Interactions from different domains on merged networks
Metabolic reactions Endogenous metabolites
Genes and proteins
Copyright GeneGo 2000-2007
Networks of protein interactions
– Dynamic and interactive
– Exploratory tool
– Build new pathways for genes of interest
Pathway Integration
Interactive, static maps
– 550 proprietary maps
• With thousands of canonical pathays
– Signaling, regulation, metabolism, diseases
– Backbone of formalized “state of art” in the field
Copyright GeneGo 2000-2007
Canonical pathways
Copyright GeneGo 2000-2007
Unique multi step canonical interactions on networks
List of pathways (cascades from canonical maps) that
occur on network
Selected pathways are highlighted
Copyright GeneGo 2000-2007
Copyright GeneGo 2000-2007
MetaRodentMetaRodent
MetaRodent
New add on module
The first rat and mouse specific pathway analysis platform
Genes and proteins specific to rat and mouseProtein complexes specific to rat and mouseSpecies specific canonical pathway maps
Copyright GeneGo 2000-2007
Export formExport form
Species added to export form
Copyright GeneGo 2000-2007
Androsteneone and testosterone biosynthesis and metabolism (part1)
Enzyme AST: gene Spm2A (rat)
Enzyme HSD3B5 – gene Hsd3b5 (mouse)
Enzyme SULT2A2
Enzyme HSD3B4
Sulfotransferase
Steroid dehydrogenase
Copyright GeneGo 2000-2007
Systems ReconstructionTM Technology
Ontologies in MetaMiner
Ontologies from GeneGo
Copyright GeneGo 2000-2007
Entities, attributes and relationships
Disease tree
Disease-tree branch
Disease page Disease description
List of genes, RNAs, and proteins associated with disease including
protein biomarkers list
Chemical biomarkers list
Drugs list
Gene page
Chem. page
Gene-disease link information
Disease pathway NWs
Disease pathway maps
Correspondent protein NW objects
CorrespondingProtein page
RNA page
Complex or groupProtein page
RNA-disease link information
Protein-disease link information
Chem-disease link information
Drug action pathway maps
Correspondent chem. NW objects
NW objects interaction
Disease-specific interactions list
Disease-specific interactioninformation
Copyright GeneGo 2000-2007
Gene, RNA, protein, compound – disease relationships in ontology
DNA Possible difect typemodification (methylation, for example)mutationSNPrearrangement amplificationlocus change
RNAalternative transcriptsplice-variant
Proteinposttranslation modification splice-variant isoformamino acid exchangealteration of interactionalteration of localization
influence on protein activity
application (biomarker)
influence on protein activityalteration quantity of RNA
application (biomarker)
alteration of protein activityalteration of protein quantity
application (biomarker)application (drug target)
Compound
alteration of compound quantity
application (biomarker)application (drug)
Causative relationscause/predisposition
risk
hypothesis manifistation protectionno relation
Copyright GeneGo 2000-2007
Disease Tree Structure. Example of ambiguity in MeSH
Copyright GeneGo 2000-2007
Cardiac Arrhythmia ontology in MetaMiner
Copyright GeneGo 2000-2007
Cardiac Arrhythmia: summary of gene-disease relationships 60 genes (auto)
23 genes
5 genes
22 genes7 genes
23 approve drugs (curated)
82 genes (curated)
147 inducing compounds (auto)
Copyright GeneGo 2000-2007
Drugs in ontology: relationships for indication
Coronary Arteriosclerosis
Hypertension
EdemaGlaucoma
Seizures
Heart Failure
Angina
Epilepsy
Ovarian/Lung neoplasm
• Drug target
• Approved drugs
Transport H+ and CO(2)
Copyright GeneGo 2000-2007
Metabolic pathway
Cholestasis
Hepatotoxicity
NephrotoxicityTubular toxicity
Cardiomyopathy
Anemia
Ototoxicity
Drugs in ontology: relationships for toxicity
Copyright GeneGo 2000-2007
Entity
OWL schema for disease-centered ontology in MetaMiner
Copyright GeneGo 2000-2007
Gene-Disease connections in public domain and GeneGo
GENE MeSH
•Hierarchical structuredisease classification 4,888 diseases•Genes associated with diseases 6,429•Cited articles 33, 792
Public domain does not have structured information about disease connectivity(by clinical classification) and causative relations with genes and proteins
Only citation with Diseases name. Low trust
Only hierarchical structure
disease tree
OMIM
Only genetic info (mutation, SNPs)-No expression- No protein activity, loc
GeneGo
Copyright GeneGo 2000-2007
Drug toxicity tree
Folders from MeSH GeneGo tox ontologies based on reviews
38 Drug-induced pathological processes
Copyright GeneGo 2000-2007
MetaSearch™MetaSearch™Included in MetaCore
Copyright GeneGo 2000-2007
Creating experiment based on search resultsCreating experiment based on search results
List of genes searched for are exported to MetaCore™ as an
“experiment”
Copyright GeneGo 2000-2007
New interactions modules
• MapEditor: custom maps– Draw pathways maps from scratch– Transform gene lists into networks into pathway maps– Edit MetaCore’s canonical maps– View and score your maps within the context of canonical maps– Map experimental data on custom maps
• MetaLink: overlaying custom interactions– Import custom interactions (Y2H, co-expression, pull-down, etc.)– Visualize using GeneGo network building algorithms– Score “unknown” proteins (high IP potential) based on relevance to
“benchmark” networks built from MetaCore interactions
• PathwayEditor: annotation technology transfer, at the database level– Custom annotation of interactions, compounds, diseases, metabolism in the
framework of internal annotation system at GeneGo– Use the annotation forms, workflows and QC system developed at GeneGo– Novel objects are imported and integrated with pre-existing data in MetaCore
Copyright GeneGo 2000-2007
What is MapEditor?
BEFORE AFTER
Days?Weeks?
MinutesHours
Wet lab research,Writing reviews
MAC or PC
Copyright GeneGo 2000-2007
MapEditor:Adding Interactions
Click on the source and target objects to create an interaction
If an interaction is already present in the database between the
objects, it will be listed here
Use the Link tool to create interactions between objects
Copyright GeneGo 2000-2007
MapEDitor:Edit and publish the new map
Copyright GeneGo 2000-2007
MapEditor:The new map is now a functional part of MetaCore
Copyright GeneGo 2000-2007
Resulting Direct Interactions network
Pink interactions are from the uploaded links file Mouse over an
interaction to see the uploaded weight value
Blue interactions are in both the links file and the MetaCore database
MetaLink: Mapping interaction sets on networks
Copyright GeneGo 2000-2007
MetaDrug™A unique systems pharmacology
platformGeneGo, Inc.
2007
Copyright GeneGo 2000-2007
Do in silico tools REALLY help in pre-clinical discovery?
Novel compound
Wet
lab
:- C
hem
istr
y- B
iolo
gy
IND applicationNeed BIOLOGAL effects:-Indication-Toxicity-Off target effects
Structure-basedmodeling
QSARs modelsactivity
Dry lab predictions
Pharmacophores
QSAR models metabolism
QSARs modelstoxicity
2-6
yea
rs;
10
-30
m?
Clinical trials: 9 out of 10 compounds fail
Is there any other way?
Why?
Copyright GeneGo 2000-2007
A NEW paradigm!
=1 protein at a time
•3,400 disease related genes•500+ diseases w/genes•600 known drug targets•3,000 toxicity related genes•700 cellular processes• 200,000 protein interactions
Present a structureas a GROUP of proteins
Query KNOWLEDGE base:- Diseases- Toxicities- Pathways- Processes- Networks
-Score based on analysis-Indication-Toxicity
-Select best structures
=Systems level analysis
Many, many products MetaDrug
IND BIOLOGAL effects:-Indication-Toxicity-Off target effects:
=
GROUPS OF PROTEINS!
Novel compounds
Structure-basedmodeling
QSARs modelsactivity
Dry lab predictions
Pharmacophores
QSAR models metabolism
QSARs modelstoxicity Report BY and FOR
end user = chemist
= Systems level effects
Copyright GeneGo 2000-2007
MetaDrug: applications
Hit-to-lead libraries
Lead optimization
Selection for IND
Libraries
Screening
Hits MetaDrug“Front load”
risk
1:12 makes it to market*Source:” Merck's Recall of Rofecoxib — A Strategic Perspective”. New England Journal of Medicine Volume 351:2147-2149
Copyright GeneGo 2000-2007
How do we do it?
Compound
Metabolites
Db similarity Db similarity
Lists of gene and protein IDs, some w/numerical data
Primary targets/first indication:-Diseases-GeneGo processes-GO processes-Drug target networks
Secondary targets/additional indications- Diseases- GeneGo processes- GO processes
- Drug target networks Secondary targets/off-target side effects
-Diseases-Toxicity networks- GeneGo processes-GO processes
Primary targets/human toxicity:-Toxicity networks-GeneGo processes-GO processes
Summary efficacy/tox index per indication
Copyright GeneGo 2000-2007
MetaDrug Applications
• Front load risks• Ranking of lead compounds
– Prioritizing hits• Compare internal compounds with drugs on the market• Use knowledge base to predict metabolites for screening
purposes• HTS• In silico tools speed up virtual screening for medicinal
chemists and DMPK groups• Identifying compound targets in synthetic lethal strains• Prediction of side effects• Drug-drug interaction• Identification of targets of natural products, separated from
bacteria and plants• Identification of exact molecular targets of molecules,
known to be active on a whole pathway with exact mechanism of action
• Comparing therapeutic drugs to endogenous compounds
Copyright GeneGo 2000-2007
Integration with MDL DiscoveryGate
Copyright GeneGo 2000-2005
Search compound of interest or its metabolites in MDL databases *(requires access to Discovery Gate)
Copyright GeneGo 2000-2007
Cover Slide
Data Management & Storage
Copyright GeneGo 2000-2007
OMICs Data Management system
Data IN (Parsing)
Data warehousing-Experiments-Gene/protein/compound lists- Maps
Results out (Export):-“Common language” format: Word, Excel- Compatibility with 3rd parties - Compliance with regulations: FDA-End Note for publications
- Reporting-Access/security
HTS, HCS
Results warehousing- Workflows- Network lists- Networks- Maps- Gene/protein/compounds lists
IDs converter
Data processing in MC/MD
Copyright GeneGo 2000-2007
Data Manager data parsers
HTS, HCS
Pathway Editor MapEditorMetaLink
Custom interactions data:-Y2H-Pull-down-Co-expression- annotation
Custom maps,networks, pathways
Structuressdf, MOLMolecular bio data HTS, HCSMetabolites ISIS DB
General parser Metabolic parser Structure parser
Gene lists
Copyright GeneGo 2000-2007
Data format conversion
DM
IN OUT
Human, mouse or rat out
Genomics
Copyright GeneGo 2000-2007
Unique:Functional analysis steps
• Enrichment analysis for gene, protein, compound sets– Hyper G, GSEA, GSA etc.– Multidimensional analysis: multiple ontologies
• GO processes• GG processes• Canonical pathways• Diseases
– Export of sub-sets for network analysis– Low resolution
1000 genes; Multiple sets
2-100 genes perentity
Top 5, 10 genesIn each class
Arbitrary # ofindividual proteins
and interactions
• Network analysis– Multiple pre-filters (species, interactions
mechanisms, organelles etc.)– Parameters: enrichment with genes from set, canonical pathways, specific protein classes– Algorithms: SP, DI, AN, TFs, Receptors etc.– Statistics: hubs, preferred pathways etc.– Highest resolution: individual proteins or isoforms
• Interactome analysis– Whole-set analysis– Over- and underconnected nodes in the dataset
• Interactions neighborhood• TFs, kinases, receptors, etc.
– Scoring for interactions within set: FDR
Resolution
Copyright GeneGo 2000-2007
GeneGo’s Data Manager as a repository and analysis suite
All data are stored and analyzed in MetaCore
We confirmed that these gene groups work on independent set of patients. 295 set was done on DIFFERENT chip, no norms, but still those groups of genes worked! We also were able to find MORE genes for each group of genes and we included them into this picture. On this larger set we were able to see small group of ERBB+/ESR- patients, which we didn’t see on 122 set (probably the set was too small and such patients are rare). We noticed that 4 new genes from ERS1 group: ANDR, DNALI1, XBP1, FOXA1 are always over-expressed in both ESR+ and ERBB+, so we decided to form another group of genes from them. For the same reason 5 upper genes from PLAU group (red):COL11A1; PLAU, PLAUR, WNT2, MMP14 were united into separate group, leaving COL5A2 out. Then we did re-clustering with 9 resulting groups of genes
Identified on 122 set:2. ERBB2
(STARD3,GRB7,ERBB2,THRAP4,PPARBP)3. ESR1 (GATA3, XBP1,TFF3,ACADSB,ESR1)4. Keratin 5/17 (KRT17, TRIM29,KRT5,GABRP)5. Keratin 8/18 (KRT18, KRT8, PP2CA,KRT8L2)6. PLAU (COL5A2, FAP, THY1, PLAU)7. STAT1 (STAT1, UBE2L6,TAP1,LAP3)8. CEACAM (CEACAM6, CEACAM5, CEACAM7)
Identified on 295 set:2. ERBB2 (AMPL,GRB7,ERBB2,THRAP4,PPARBP, LENEP)3. ESR1 (GATA3, XBP1,TFF3,ACADSB,ESR1)4. AR (AR, FOXA1, DNALI1)5. Keratin 5/17 (KRT17, TRIM29,KRT5,GABRP, COL11A1)6. Keratin 8/18 (KRT18, KRT8, PP2CA,KRT8L2)7. PLAU (MMP14, PLAU, PLAUR, WNT2)8. FN1 (FN1, THY1, COL5A2, BAPX1)9. STAT1 (STAT1, UBE2L6,TAP1,LAP3, CXCL10, PSMB9)10. CEACAM (CEACAM6, CEACAM5, CEACAM7, CEACAM3)
Sørlie 122
Sørlie 295
Copyright GeneGo 2000-2007
Export to EndNote
1. 2.
3.
Copyright GeneGo 2000-2007
Easy Data Sharing
Right click to share maps, networks and experiments
Copyright GeneGo 2000-2007
MetaSearch™ overview included in MetaCoreMetaSearch™ overview included in MetaCore
• Powerful, easy to use search tool– Flexible way to extract data from MetaCore/MetaDrug
Discovery Platform (MetaDiscovery)
• Customizable Boolean queries
• Real-time communication with MetaDiscovery™ tools
• Results can be visualized and analyzed in MetaCore™ and MetaDrug™
Copyright GeneGo 2000-2007
MetaSearch: starting your queryMetaSearch: starting your query
Start your query by selecting type of entities you would like to find. In
this example maps from MetaCore™
Copyright GeneGo 2000-2007
Step2: adding search termsStep2: adding search terms
We are looking for maps containing JAK1
Copyright GeneGo 2000-2007
Finding all maps with JAK1 Finding all maps with JAK1 ANDAND STAT1 STAT1
In the complete query we specify two criteria: maps should contain both JAK1
AND STAT1
Copyright GeneGo 2000-2007
Building nested queriesBuilding nested queries
We further restrict the search to maps NOT containing Oncostatin M
Copyright GeneGo 2000-2007
Expanding your queryExpanding your query
Then we extract all genes from these maps
Copyright GeneGo 2000-2007
Updates, Maintenance and Support
• Support– By phone– By email– On line and on site discussions
• Training– Free on line training
• Every first Monday of the month MetaCore, MetaRodent and MapEditor• Every last Monday of the month MetaDrug and Tox
– On site training• Training manuals
– MetaCore 4.5 Basic – MetaCore 4.5 Advanced– MetaBase– MapEditor– MetaDrug
• Tutorials, FAQ, new release information– Under Q&A tab
• Integrations documentation– Under integration tab
Copyright GeneGo 2000-2007
Cover SlideContact: Mark HughesTel: + 44 7786 150699Email: [email protected]