a rapid algorithm for generating minimal pathway distances: pathway distance correlates with genome...

Post on 17-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A rapid algorithm for generating minimal pathway

distances: Pathway distance correlates with genome distance but

not enzyme function

Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,David Bogle2, Lazaros Papageorgiou2#

1 Department of Biochemistry and Molecular Biology and2 Department of Chemical Engineering,University College London, London, WC1E 6BT, UK3 Department of Crystallography,Birkbeck College, Malet Street, London, WC1E 7HX, UK

* Corresponding author (biology): rison@biochem.ucl.ac.uk# Corresponding author (algorithm): l.papageorgiou@ucl.ac.uk

Outline

• What is pathway distance?• Why calculate pathway distance?• Original method• Novel method - mathematical

programming• Application:

– Genomic distance– Enzyme function

The shortest pathway distance between GltA and Mdh is 8 steps (considering directionality) or 2 steps (without

directionality)

Each metabolic transition represents a pathway distance unit (step)

Pathway distance considers distance between metabolic enzymes

Should take into account:• directionality• circularity

The pathway distance between GapA and GltA is 7 steps

This step is

reversible

This step is irreversible

(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)

Glycolysis+

TCA Pathway Distance

Pathway Distance

• Reverses the “usual” pathway representation (substrates as nodes, enzymes as edges)

• Pathway distance is inclusive; the source enzyme has a distance of 1 step

Why calculate pathway distance?

• Metabolic pathways are complex networks of interaction enzymes, substrates and co-factors

• Relatively well characterised for certain organisms (e.g. E. coli )

• Much work done on modelling metabolism but now also much interest in pathways as an indicator of “connectivity” between genes

• Pathway distance (Dp) is an extension of this connectivity

Original Method

• Represent pathways as directed acyclic graphs

• Use arbitrary direction for pathways• “Snip” open any cycle• Perform DFT of resulting graphs• Collect set of genes at distances

2,3,…,n along resulting traversals

Glycolysis+

TCA

(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)

Original Method

Original EcoCyc pathways include:• Directionality• Cycles

Dictate directionality:• Arbitrarily set direction (top to bottom, clockwise)

mdh

gltA

“Snip” cycles

Pathway Distance Algorithm

• For each metabolic pathway– For each enzyme in the pathway

• Find the minimal distances from the source enzyme to all other enzymes by solving linear programming problems of the type:

Maximise Summation_of_Enzyme_Distancessubject toEnzyme_Connectivity_Constraints

• Post processing “calculations” are integrated in the algorithm (e.g. genome distance or enzyme function conservation)

For each node i* (source)

Maximise Di

i

subject to: Dj Di + 1, (i,j): Lij = 1

0 Di T, i

Di* = 1SETS– i,j: nodesPARAMETERS

– Lij:1 if there is a link from i to j, 0 otherwise

– T: large numberCONTINUOUS VARIABLES

– Di: Distance of node i from source node

i j

Algorithm - objective function and constraints

i* AMax DA+DB+DC+DD

s.t.DA = 1

DA DB+1

DB DA+1

DC DB+1

DC DD+1

DD DC+1

DD DB+1

A

B

C

D

A

B

C

D

1

2

3

3

Algorithm - Inequalities

Key Features of Algorithm

• Hierarchical solution procedure • Based on linear programming

techniques • Using an enzyme-node network

representation

Advantages of Algorithm

• Efficiency in tackling– pathway circularity– reaction directionality

• Modest computational times• Implementation within GAMS

software system

Metabolic pathways

• We encoded 68 E. coli small molecule metabolism (SMM) pathways, these pathways were derived from EcoCyc

• This represents a set of 594 enzymes

• Pathway distances ranged from 2 to 15

Pathway Distance and Genome Distance

• Calculate minimal pathway distances for all gene pairs in each pathway

• For the same pairs, calculate the base pair separation of the genes encoding the enzymes in the E. coli genome (Dg)

• Plot percentage of gene pairs within a certain genome distance against pathway distance

Shorter genomic distances are more likely at smaller pathway distances

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

16.00%

18.00%

20.00%

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pathw ay Distance

Cu

mu

lati

ve p

erce

nta

ges

<100bp <1000bp <10000bp <100000bp

Genome Distance - Conclusions• Strong correlation between Dp and

Dg

• Genes with small Dp tend to have shorter Dg

• Genes involved in nearby metabolic reactions are genomically clustered

Pathway Distance and Function• Calculate minimal pathway

distances for all gene pairs in each pathway

• Compare the EC numbers assigned to the genes in each pair

1.2.1.12 12. enzymespecific

2. acts on aldehydeor oxo group

1. NAD/NADP asacceptor1. oxidoreductase

1.2.1.121.2.1.20

1.2.1.122.2.1.20

L3 cons

No cons

e.g. G-3-P dehydrogenase

Pathway distance and EC number conservation

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

0 2 4 6 8 10 12 14 16

Pathway Distance

Per

cent

age

of

pai

rs a

t p

athw

ay d

ista

nce

None Level 1 Levels 1+2 Level 1+2+3 All levels

Function - Conclusions

• No observable correlation between pathway distance and function (as represented by EC number)

• Enzymatic chemistries are varied along the conversion from one substrate to the next and aren’t performed in ‘blocks’ of similar catalysis

Conclusions - Algorithm

• We have an effective, correct and rapid algorithm to calculate metabolic distance

• The Dp metric can be usefully used as a measure protein functional relation

Conclusions - Biology

• As expect pathway distance correlates with genome distance

• Pathway distance does not correlate with function as determined by EC number

Acknowledgements

• Sarah Teichmann, University College London

• Peter Karp, SRI international, Melno Park, CA

• Monica Riley, Alida Pellegrini-Toole, Marine Biological Laboratory, Woods Hole, MA

A rapid algorithm for generating minimal pathway

distances: Pathway distance correlates with genome distance but

not enzyme function

Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,David Bogle2, Lazaros Papageorgiou2#

1 Department of Biochemistry and Molecular Biology and2 Department of Chemical Engineering,University College London, London, WC1E 6BT, UK3 Department of Crystallography,Birkbeck College, Malet Street, London, WC1E 7HX, UK

* Corresponding author (biology): rison@biochem.ucl.ac.uk# Corresponding author (algorithm): l.papageorgiou@ucl.ac.uk

All distances

0%

20%

40%

60%

80%

100%

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pathw ay distance

Cu

mu

lati

ve p

erce

nta

ges

100 1000 10000 100000 1000000 10000000

Pathway distance and EC number conservation

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pathway distance

Cu

mu

lati

ve p

erce

ntag

e o

f p

airs

None Level 1 Levels 1+2 Levels 1+2+3 All levels

• i* ADA = 1

DA DB+1

DB DA+1

DC DB+1

DC DD+1

DD DC+1

DD DB+1

DE DD+1

DE DF+1

DF DC+1

DF DE+1

A

B

E

C

D

F

A

B

E

C

D

F

1

2

3

4

3

4

top related