copyright © 2003-2006 ariadne genomics, inc. all rights reserved molecular networks in mammals:...
Post on 20-Dec-2015
214 views
TRANSCRIPT
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Molecular Networks in Mammals: Extraction from Literature and
Microarray Analysis
byIlya Mazo, Ph.D.
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
It’s All About Pathways
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Promise of Systems Biology
Understanding:
Drug specificity Chemotherapy response Biomarker panels New target mechanisms
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Building Models Identify the elements of the system Describe the interactions/regulations
between such elements Simplify the system by identifying
components (functional modules or pathways)
Integrate/validate with experimental data
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Available Pathway Information
0
2 mln
4 mln
6 mln
8 mln
10 mln
12 mln
14 mln
1965
1968
1971
1974
1977
1980
1983
1986
1989
1992
1995
1998
2001
2004
Year
Ab
stra
ct c
ou
nt
PubMed
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
MedScan Information Extractor
Reads >1000 abstracts per minute
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
How MedScan extracts facts from text? Sentence in PubMed:
“ Axin binds beta-catenin and inhibits GSK-3beta.” Identify Proteins in Dictionary (in red):
“ Axin binds beta-catenin and inhibits GSK-3beta.” Identify Interaction Type (in black):
“ Axin binds beta-catenin and inhibits GSK-3beta.”
Extracted Facts: Axin - beta-catenin relation: Binding Axin -> GSK-3beta relation: Regulation, effect:
Negative
Syntactic Layer
Noun Phrase
Verb Phrase
Noun Phrase
Semantic Layer
Protein Protein Relations
Protein
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Overview of MedScan ArchitectureInput Text Input Text
Tokenizer
Semantic Interpreter
Semantic treeSemantic tree
Tagged SentencesTagged Sentences
Ontological interpreter
Syntactic Parser
Preprocessor
Sequence of Words Sequence of Words
Sentence StructureSentence Structure
Databaseof relations
Grammar
Lexicon
Extractionrules
Protein names dictionary
Converter
Extracted factsExtracted facts
Dictionary-based
Identifies proteins and small molecules
Context-free grammar
Grammar and lexicon are proprietary.
They are domain-independent by design but focused on biomedical field.
Rule-based
Rules are equivalentto ontology
Pattern Matcher
Extraction patterns
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Database of Pathways
>94 % precision>70 % recovery
MedScan
[Transcription] [factor] {7157=p53} [activates] [apoptosis] [in] [hepatocytes]
ResNet Database
PubMed – 7 mln abstracts
1,000,000 Facts
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Extracted Information
Relation Type Count
Expression Control 99,361
Binding 50,812
Protein Modification 25,368
Mol. Synthesis 99,643
Mol. Transport 48,423
Regulation 675,539
Promoter Binding 3,661
Total: protein relations 1,002,807
1,002,807 relations (3.7 mil. findings extracted from 2005 Medline and 43 FTJ)
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Build Pathway (Find Neighbors)
ITI…
IL2
M…
INS
TH
int
ad
gph
T
An
c
p
cell C
c
l
cell
U
9
gi
p
A
dd
A
a
ph
2
p
C
2
6
n6f
P
R
li
d
P
1P
cy tb
ADA
EGF
2003
2005 2006
2004
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Mechanistic Model of Disease+ genes harboring DAVs associated with
Type 2 Diabetes Mellitus. (from Mol Cell Proteomics, Sharma et al
2005)
ADCYAP1 LEPR
ADRB2 LECAM-1
ADRB3 NOS3
AGT NPY
APM1 NR3C1
CD38 NR3C1
FABP2 PC-1
GCGR PGC 1
GFPT PLA2G4A
GYS1 PON 1
HFE PON 2
HFE PON2
HNF1a PPAR g2
HNF4a PPP1R3
ICAM1 PTPN1
INSR RAGE
IRS 1 SOD2
IRS 2 TGF b
KCNJ11 UCP 1
KCNJ11 UCP2
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Building Models
Identify the elements of the system Describe the interactions/regulations
between such elements Identify functional modules
(pathways) Integrate/validate with experimental
data
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Signaling Paths/Cascades
Physical relations
EGFR signaling including activation of Erk2 and the ELK-1 transcription factor
The MAP and ERK kinase (MEK-1) is a dual specificity kinase that phosphorylates ERK1/2 on T-E-Y.
ERK can phosphorylate and activate transcription factors such as TCF/ELK-1
Logical relations
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Inferring Cascades
Simple protein classification schema and membrane-to-nucleus signaling paradigm can be applied- Receptor- Ligand- Extracellular- Transcription factor- Nuclear receptor- Effector.
It allows for the network partitioning into severalhundreds of “signaling cascades”.
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Regulomes as Canonical Pathways
700 inferred regulomes200 textbook pathways
60% average overlap P<10-4
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Regulomes as Logical Models
Use dependency relations to determine the “area of influence” for target proteins (receptors, kinases)
1) Both PP1 and expression of dominant negative c-Src inhibited PDGF-induced PI 3 kinase.2) A pharmacologic inhibitor of c-Src, PP1
Logical Models:“what if?”
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Building Models Identify the elements of the system Describe the interactions/regulations
between such elements Identify functional modules
(pathways) Integrate/validate with experimental
data
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Profiles to Pathways
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Find significant regulators Experimental dataset: melanoma, aggressive vs.
non-aggressive cell lines, flat vs. 3D growth conditions. (Folberg and Arbieva, UIC)
p=1e-5
p=0.0004
p=0.24
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Prediction of Activity Profiles
Activity as a function of expression level and the ability to induce changes in the targets
Random Markov fields formalism
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Combining the Approaches
1. Start with the global network of interactions2. Add expert knowledge3. Infer subnetworks (individual pathways)
Signaling cascades and regulomes Phenotype or disease association Regulators and downstream targets Advanced models
4. Use available data (microarrays, proteomics) to screen for relevant pathways.
5. Add validated pathway libraries to the software package.
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Kinetic Models
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Integrated Systems Biology Platform
Client PC
Local DB
PathwayStudio
ToolsToolsToolsTools
Linux Server
Oracle/PostgreSQLOracle/PostgreSQL
Tomcat, JavaTomcat, Java
PathwayExpert
Web Client
ToolsTools
Central DB
ToolsTools
Copyright © 2003-2006
Ariadne Genomics, Inc.
All Rights Reserved
Published by Scientists