metnetaligner: a web service tool for metabolic network alignments

1
- Random degree-conserved graph generation by reshuffling edges - Randomized P-Value computation (P-Value cutoff : 0.01 for 100 randomized graphs) Qiong Cheng*([email protected]), Robert Harrison, Alexander Zelikovsky ([email protected]) Department of Computer Science, Georgia State University, Atlanta, GA 30303 * Partially supported by GSU Molecular Basis of Disease (MBD) and Brains & Behavior (B&B) MetNetAligner: a web service tool for metabolic network alignments References : Q. Cheng, A. Zelikovsky, Network Mapping of Metabolic Pathways, Analysis of Complex Networks: From Biology to Linguistics, Wiley-VCH 2009 Q. Cheng, R. Harrison, and A. Zelikovsky. "MetNetAligner: a web service tool for metabolic network alignments". Bioinformatics 2009 (To appear) Q. Cheng, P. Berman, R. Harrison and A. Zelikovsky, "Fast Alignments of Metabolic Networks ", Proc. of IEEE International conference on Bioinformatics and Biomedicine (BIBM 2008), pp 147- 152 Q. Cheng, D. Kaur, R. Harrison, and A. Zelikovsky, "Mapping and Filling Metabolic Pathways ", RECOMB Satellite Conference on Systems Biology 2007 Q. Cheng, R. Harrison, and A. Zelikovsky, "Homomorphisms of Multisource Trees into Networks with Applications to Metabolic Pathways", Proc. of IEEE 7-th International Symposium on BioInformatics and BioEngineering (BIBE'07) Abstract The accumulation of high-throughput genomic, proteomic, and metabolical data allows for increasingly accurate modeling and reconstruction of metabolic networks. Alignment of the reconstructed networks can help to catch model inconsistencies and infer missing elements. In this note we present the web service tool MetNetAligner which aligns metabolic networks, taking in account the similarity of network topology and the enzymes’ functions. It can be used for predicting unknown pathways, comparing and finding conserved patterns, and resolving ambiguous identification of enzymes. The tool supports several alignment options including allowing or forbidding enzyme deletion and insertion. It is based on a novel scoring scheme which measures enzyme-to-enzyme functional similarity and a fast algorithm which efficiently finds optimal mappings from a directed graph with restricted cyclic structure to an arbitrary directed graph. MetNetAligner is available as web-server at: http://alla.cs.gsu.edu:8080/MinePW/pages/gmapping/ GMMain.html Metabolic network alignments Fig 1. A portion of pentose phosphate pathway 1.1.1.49 1.1.1.34 2.7.1.13 3.1.1.31 1.1.1.44 Metabolic pathway model a directed graph in which vertices correspond to enzymes and there is a directed edge between two enzymes if the product of the reaction catalyzed by the first enzyme is a substrate of the reaction catalyzed by the second. Problem formulation and solution Experiments results Computing P-Value of homomorphism mapping All-against-all mappings among 4 species : Identifying conserved pathways 24 pathways that are conserved across all 4 species 18 more pathways that are conserved across at least three of these species Resolving ambiguity (see figure 2) Discovering pathways holes (see figure 3) 2.6.1.1 2.6.1.1 1.2.4.- 1.2.4.2 2.3.1.- 2.3.1.61 6.2.1.5 6.2.1.5 1.3.99.1 1.3.99.1 4.2.1.2 4.2.1.2 1. 1.1.1.82 1.1.1.82 Fig 2. Resolving ambiguity example: Mapping of glutamate degradation VII pathways from B.subtilis to T. thermophilus (p<0.01). The enlighted node reflects enzyme homology. 1.5.1.5 1.5.1.5 3.5.4.9 3.5.4.9 3.5.1. 10 6.3.4.3 6.3.4.3 Fig 3. Pathway holes’ example: Mapping of formaldehyde oxidation V pathway in B. subtilis to formy1THF biosynthesis pathway in E. coli (p<0.01) (only vertices in the image of the pattern in the text are shown. Mapping metabolic pathways - should capture the similarities of enzymes represented by proteins as well as topological properties. P T f Tool : MetNetAligner Enzyme-to-enzyme similarity 1) By the lowest common upper class distribution 2) By tight reaction property Topology similarity 2) Gene duplication and function sharing = vertex collapsing 1+2=Graph homomophism 1) Embedding - Subgraph isomorphism 4) Enzyme deletion = bypass deletion : send vertex to b (Kelly et al 2005) 1+3+4= graph homeomorphism 5) Subpath deletion = strong deletion : send vertex to d (Yang et al 2007) (1+5) 3) Enzyme insertions = edge subdividing -fine per insertion 1+3=Approximate graph homeomorphism (Pinter et al 2005 ) 1+2+3+4+5 = graph homo-homeo morphism Topologies Linear (Forst & Schulten[1999], Chen & Hofestaedt[2004];) D C B A D X A D C B A X B A Tree (Pinter [2005] V G V T logV G V G V T logV T ) Arbitrary topology Mapping : Linear pattern Graph (Kelly et al 2004) ( V T V G ) Exhaustively search (Sharan et al 2005 ( V T V G ) Yang et al 2007 ( VG V G ) Given: 1) a metabolic pathway P =<V P , E P > (Pattern) and 2) a metabolic network T =<V T , E T > (Text) Find minimum cost alignment f : P T so that Minimize cost(f)=u in V P Δ(u, f v (u))+ λl (| f l (l)|-1) f v : every vertex in V P is mapped to a vertex in V T U {b,d}; f l : every path l P across vertices in f v -1 (V T ) is mapped to path l T Pathway Database Graph Visualizatio n Graph searching Additional Value Service Query Visualized Outputs Browsers Graph Extract ion Graph Layout Cached Indexin g Alignment operations and cost Matches of enzymes between pattern and text - Cost(match of u->f v (u))=0 Mismatches of enzymes - Cost(mismatch of u -> f v (u))=Δ(u, f v (u) ) Insertions of text enzyme to pattern - Cost(insertion of v under f l )=λ Deletions of pattern enzyme - Cost(deletion of u under f)= Δ(u, b / d ) 1) Bypass deletion 2) Strong deletion 3) Week deletion u i u Text T v j v Computation model for multi- source tree pattern u i k Pattern P u i cost(u,v)=Δ(u,v)+min A(u, v) strongD(u i ) min(cost(u i , v j )+ λ h(v, v j )) v j min(weakD(u, u i , u i k ) + cost(u i k , v j )+ λ h(v, v j )) u i k B(u i , v) Solution Preprocessing: 1. Transitive closure of text T 2. Pattern graph ordering 3. Calculate the penalties of pattern vertex strong deletion 4. Calculate the penalties of pattern vertex weak deletion Dynamic Programming + Adaption of Dijkstra Runtime for DP solution with Fibonacci heaps: O(|V P | (|E T | + |V T |log|V T |)). a b c d e a b c d e Handle cycles DP does not work when pattern has cycles “Fix” images for some pattern vertices and reduce to acyclic case Find Minimum Feedback vertex set F(P): - V P -F(P) is acyclic - NP-complete but easy to be approximate Reduce single cycle pattern graph alignment to multi- source tree pattern graph alignment Reduce biconnected component alignment to single cycle pattern graph alignment Reduce it to alignments of biconnected component s Three possibilities of the contribution of the child u i to the parent u’s mapping (u->v): 1. u i is mapping to v j (v j is a descendent of v) 2. u i is strong deleted: strongD(u i ) 3. u i is bypass deleted: weakD(u, u i , u i k ) - Alignment of two pathways - Alignments from a single pattern pathway to all pathways in the text organism - Alignments from every pathway in the pattern organism to every pathway in the text organism

Upload: mark-wood

Post on 30-Dec-2015

30 views

Category:

Documents


0 download

DESCRIPTION

e. Graph Extraction. MBD Research Day. Graph Layout. a. b. 2009. e. A. Graph Visualization. B. C. D. c. Graph searching. d. A. X. D. a. b. Query. c. min(cost( u i , v j )+ λ h( v, v j )). Cached Indexing. Pathway Database. v j. A. A. 2.6.1.1. 6.2.1.5. 1.3.99.1. 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MetNetAligner: a web service tool for metabolic network alignments

- Random degree-conserved graph generation by reshuffling edges

- Randomized P-Value computation (P-Value cutoff : 0.01 for 100 randomized graphs)

Qiong Cheng*([email protected]), Robert Harrison, Alexander Zelikovsky ([email protected])

Department of Computer Science, Georgia State University, Atlanta, GA 30303

* Partially supported by GSU Molecular Basis of Disease (MBD) and Brains & Behavior (B&B)

MetNetAligner: a web service tool for metabolic network alignments

References :

Q. Cheng, A. Zelikovsky, Network Mapping of Metabolic Pathways, Analysis of Complex Networks: From Biology to Linguistics, Wiley-VCH 2009

Q. Cheng, R. Harrison, and A. Zelikovsky. "MetNetAligner: a web service tool for metabolic network alignments". Bioinformatics 2009 (To appear)

Q. Cheng, P. Berman, R. Harrison and A. Zelikovsky, "Fast Alignments of Metabolic Networks ", Proc. of IEEE International conference on Bioinformatics and Biomedicine (BIBM 2008), pp 147-152  

Q. Cheng, D. Kaur, R. Harrison, and A. Zelikovsky, "Mapping and Filling Metabolic Pathways ", RECOMB Satellite Conference on Systems Biology 2007   

Q. Cheng, R. Harrison, and A. Zelikovsky, "Homomorphisms of Multisource Trees into Networks with Applications to Metabolic Pathways", Proc. of IEEE 7-th International Symposium on BioInformatics and BioEngineering (BIBE'07) 

Ron Y Pinter, Oleg Rokhlenko, Esti Yeger-Lotem, Michal Ziv-Ukelson: Alignment of metabolic pathways. Bioinformatics. LNCS 3109. Springer-Verlag.(Aug 2005)21(16): 3401-8

AbstractThe accumulation of high-throughput genomic, proteomic,and metabolical data allows for increasingly accurate modeling andreconstruction of metabolic networks. Alignment of the reconstructednetworks can help to catch model inconsistencies and infer missingelements. In this note we present the web service tool MetNetAlignerwhich aligns metabolic networks, taking in account the similarityof network topology and the enzymes’ functions. It can be usedfor predicting unknown pathways, comparing and finding conservedpatterns, and resolving ambiguous identification of enzymes. The toolsupports several alignment options including allowing or forbiddingenzyme deletion and insertion. It is based on a novel scoring schemewhich measures enzyme-to-enzyme functional similarity and a fastalgorithm which efficiently finds optimal mappings from a directedgraph with restricted cyclic structure to an arbitrary directed graph.

MetNetAligner is available as web-server at:http://alla.cs.gsu.edu:8080/MinePW/pages/gmapping/GMMain.html

Metabolic network alignments

Fig 1. A portion of pentose phosphate pathway

1.1.1.49

1.1.1.34

2.7.1.13

3.1.1.31 1.1.1.44

Metabolic pathway model – a directed graph in which vertices correspond to enzymes and there is a directed edge between two enzymes if the product of the reaction catalyzed by the first enzyme is a substrate of the reaction catalyzed by the second.

Problem formulation and solution

Experiments results Computing P-Value of homomorphism mapping

All-against-all mappings among 4 species :

Identifying conserved pathways

24 pathways that are conserved across all 4 species

18 more pathways that are conserved across at least three of these species

Resolving ambiguity (see figure 2)

Discovering pathways holes (see figure 3)

2.6.1.12.6.1.1

1.2.4.-1.2.4.2

2.3.1.-2.3.1.61

6.2.1.56.2.1.5

1.3.99.11.3.99.1

4.2.1.24.2.1.2

1.1.1.1.821.1.1.82

Fig 2. Resolving ambiguity example: Mapping of glutamate degradation VII pathways from B.subtilis to T. thermophilus (p<0.01). The enlighted node reflects enzyme homology.

1.5.1.5

1.5.1.5

3.5.4.93.5.4.9

3.5.1.10 6.3.4.3

6.3.4.3

Fig 3. Pathway holes’ example: Mapping of formaldehyde oxidation V pathway in B. subtilis to formy1THF biosynthesis pathway in E. coli (p<0.01) (only vertices in the image of the pattern in the text are shown.

Mapping metabolic pathways - should capture the similarities of enzymes represented by proteins as well as topological properties.

PT

f

Tool : MetNetAligner

Enzyme-to-enzyme similarity

1) By the lowest common upper class distribution

2) By tight reaction property

Topology similarity

2) Gene duplication and function sharing

= vertex collapsing

1+2=Graph homomophism

1) Embedding - Subgraph isomorphism

4) Enzyme deletion

= bypass deletion : send vertex to b (Kelly et al 2005)

1+3+4= graph homeomorphism

5) Subpath deletion

= strong deletion : send vertex to d (Yang et al 2007) (1+5)

3) Enzyme insertions = edge subdividing-fine per insertion1+3=Approximate graph homeomorphism (Pinter et al 2005 )

1+2+3+4+5 = graph homo-homeo morphismTopologies

Linear (Forst & Schulten[1999], Chen & Hofestaedt[2004];) DCBA

DXA

D

CB

A

XB

ATree (Pinter [2005] VGVTlogVGVGVTlogVT) Arbitrary topologyMapping : Linear pattern Graph (Kelly et al 2004) ( VTVG)Exhaustively search(Sharan et al 2005 ( VTVG) Yang et al 2007 ( VGVG)

Given:

1) a metabolic pathway P =<VP, EP> (Pattern) and

2) a metabolic network T =<VT, ET> (Text)

Find minimum cost alignment f : P T so that

Minimize cost(f)=∑u in VP Δ(u, fv(u))+ λ∑l (|fl(l)|-1)

fv : every vertex in VP is mapped to a vertex in VT U {b,d};

fl : every path lP across vertices in fv

-1(VT) is mapped to path lT

Pathway Database

Graph Visualization

Graph searching

Additional Value Service

Query

Visualized Outputs

Browsers

Graph Extraction

Graph Layout

Cached Indexing

Alignment operations and costMatches of enzymes between pattern and text - Cost(match of u->fv(u))=0

Mismatches of enzymes - Cost(mismatch of u -> fv(u))=Δ(u, fv(u) )

Insertions of text enzyme to pattern - Cost(insertion of v under fl)=λ

Deletions of pattern enzyme - Cost(deletion of u under f)= Δ(u, b / d )

1) Bypass deletion 2) Strong deletion 3) Week deletion

ui

u

Text T

vj

vComputation model for multi-source tree pattern

uik

Pattern P

ui

cost(u,v)=Δ(u,v)+∑ min

A(u, v)

strongD(ui)

min(cost(ui, vj)+ λ h(v, vj))vj

min(weakD(u, ui, uik) + cost(uik

, vj)+ λ h(v, vj))uik

B(ui, v)

SolutionPreprocessing: 1. Transitive closure of text T2. Pattern graph ordering3. Calculate the penalties of pattern vertex strong deletion4. Calculate the penalties of pattern vertex weak deletion

Dynamic Programming + Adaption of Dijkstra

Runtime for DP solution with Fibonacci heaps: O(|VP|(|ET| + |VT|log|VT|)).

a b

c

d

e

a bc

d

e

Handle cycles

DP does not work when pattern has cycles“Fix” images for some pattern vertices and reduce to acyclic caseFind Minimum Feedback vertex set F(P):

- VP-F(P) is acyclic - NP-complete but easy to be approximate

Reduce single cycle pattern graph alignment to multi-source tree pattern graph alignment

Reduce biconnected component alignment to single cycle pattern graph alignment

Reduce it to alignments of biconnected component s

Three possibilities of the contribution of the child ui to the parent u’s mapping (u->v):1. ui is mapping to vj (vj is a descendent of v)

2. ui is strong deleted: strongD(ui)3. ui is bypass deleted: weakD(u, ui, uik

)

- Alignment of two pathways

- Alignments from a single pattern pathway to all pathways in the

text organism

- Alignments from every pathway in the pattern organism to every

pathway in the text organism