copyright © 2003-2006 ariadne genomics, inc. all rights reserved molecular networks in mammals:...

25
Copyright © 2003- 2006 Ariadne Genomics, Inc. All Rights Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya Mazo, Ph.D.

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Molecular Networks in Mammals: Extraction from Literature and

Microarray Analysis

byIlya Mazo, Ph.D.

Page 2: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

It’s All About Pathways

Page 3: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Promise of Systems Biology

Understanding:

Drug specificity Chemotherapy response Biomarker panels New target mechanisms

Page 4: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Building Models Identify the elements of the system Describe the interactions/regulations

between such elements Simplify the system by identifying

components (functional modules or pathways)

Integrate/validate with experimental data

Page 5: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Available Pathway Information

0

2 mln

4 mln

6 mln

8 mln

10 mln

12 mln

14 mln

1965

1968

1971

1974

1977

1980

1983

1986

1989

1992

1995

1998

2001

2004

Year

Ab

stra

ct c

ou

nt

PubMed

Page 6: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

MedScan Information Extractor

Reads >1000 abstracts per minute

Page 7: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

How MedScan extracts facts from text? Sentence in PubMed:

“ Axin binds beta-catenin and inhibits GSK-3beta.” Identify Proteins in Dictionary (in red):

“ Axin binds beta-catenin and inhibits GSK-3beta.” Identify Interaction Type (in black):

“ Axin binds beta-catenin and inhibits GSK-3beta.”

Extracted Facts: Axin - beta-catenin relation: Binding Axin -> GSK-3beta relation: Regulation, effect:

Negative

Syntactic Layer

Noun Phrase

Verb Phrase

Noun Phrase

Semantic Layer

Protein Protein Relations

Protein

Page 8: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Overview of MedScan ArchitectureInput Text Input Text

Tokenizer

Semantic Interpreter

Semantic treeSemantic tree

Tagged SentencesTagged Sentences

Ontological interpreter

Syntactic Parser

Preprocessor

Sequence of Words Sequence of Words

Sentence StructureSentence Structure

Databaseof relations

Grammar

Lexicon

Extractionrules

Protein names dictionary

Converter

Extracted factsExtracted facts

Dictionary-based

Identifies proteins and small molecules

Context-free grammar

Grammar and lexicon are proprietary.

They are domain-independent by design but focused on biomedical field.

Rule-based

Rules are equivalentto ontology

Pattern Matcher

Extraction patterns

Page 9: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Database of Pathways

>94 % precision>70 % recovery

MedScan

[Transcription] [factor] {7157=p53} [activates] [apoptosis] [in] [hepatocytes]

ResNet Database

PubMed – 7 mln abstracts

1,000,000 Facts

Page 10: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Extracted Information

Relation Type Count

Expression Control 99,361

Binding 50,812

Protein Modification 25,368

Mol. Synthesis 99,643

Mol. Transport 48,423

Regulation 675,539

Promoter Binding 3,661

Total: protein relations 1,002,807

1,002,807 relations (3.7 mil. findings extracted from 2005 Medline and 43 FTJ)

Page 11: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Build Pathway (Find Neighbors)

ITI…

IL2

M…

INS

TH

int

ad

gph

T

An

c

p

cell C

c

l

cell

U

9

gi

p

A

dd

A

a

ph

2

p

C

2

6

n6f

P

R

li

d

P

1P

cy tb

ADA

EGF

2003

2005 2006

2004

Page 12: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Mechanistic Model of Disease+ genes harboring DAVs associated with

Type 2 Diabetes Mellitus. (from Mol Cell Proteomics, Sharma et al

2005)

ADCYAP1 LEPR

ADRB2 LECAM-1

ADRB3 NOS3

AGT NPY

APM1 NR3C1

CD38 NR3C1

FABP2 PC-1

GCGR PGC 1

GFPT PLA2G4A

GYS1 PON 1

HFE PON 2

HFE PON2

HNF1a PPAR g2

HNF4a PPP1R3

ICAM1 PTPN1

INSR RAGE

IRS 1 SOD2

IRS 2 TGF b

KCNJ11 UCP 1

KCNJ11 UCP2

Page 13: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Building Models

Identify the elements of the system Describe the interactions/regulations

between such elements Identify functional modules

(pathways) Integrate/validate with experimental

data

Page 14: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Signaling Paths/Cascades

Physical relations

EGFR signaling including activation of Erk2 and the ELK-1 transcription factor

The MAP and ERK kinase (MEK-1) is a dual specificity kinase that phosphorylates ERK1/2 on T-E-Y.

ERK can phosphorylate and activate transcription factors such as TCF/ELK-1

Logical relations

Page 15: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Inferring Cascades

Simple protein classification schema and membrane-to-nucleus signaling paradigm can be applied- Receptor- Ligand- Extracellular- Transcription factor- Nuclear receptor- Effector.

It allows for the network partitioning into severalhundreds of “signaling cascades”.

Page 16: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Regulomes as Canonical Pathways

700 inferred regulomes200 textbook pathways

60% average overlap P<10-4

Page 17: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Regulomes as Logical Models

Use dependency relations to determine the “area of influence” for target proteins (receptors, kinases)

1) Both PP1 and expression of dominant negative c-Src inhibited PDGF-induced PI 3 kinase.2) A pharmacologic inhibitor of c-Src, PP1

Logical Models:“what if?”

Page 18: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Building Models Identify the elements of the system Describe the interactions/regulations

between such elements Identify functional modules

(pathways) Integrate/validate with experimental

data

Page 19: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Profiles to Pathways

Page 20: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Find significant regulators Experimental dataset: melanoma, aggressive vs.

non-aggressive cell lines, flat vs. 3D growth conditions. (Folberg and Arbieva, UIC)

p=1e-5

p=0.0004

p=0.24

Page 21: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Prediction of Activity Profiles

Activity as a function of expression level and the ability to induce changes in the targets

Random Markov fields formalism

Page 22: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Combining the Approaches

1. Start with the global network of interactions2. Add expert knowledge3. Infer subnetworks (individual pathways)

Signaling cascades and regulomes Phenotype or disease association Regulators and downstream targets Advanced models

4. Use available data (microarrays, proteomics) to screen for relevant pathways.

5. Add validated pathway libraries to the software package.

Page 23: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Kinetic Models

Page 24: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Integrated Systems Biology Platform

Client PC

Local DB

PathwayStudio

ToolsToolsToolsTools

Linux Server

Oracle/PostgreSQLOracle/PostgreSQL

Tomcat, JavaTomcat, Java

PathwayExpert

Web Client

ToolsTools

Central DB

ToolsTools

Page 25: Copyright © 2003-2006 Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya

Copyright © 2003-2006

Ariadne Genomics, Inc.

All Rights Reserved

Published by Scientists