a rapid algorithm for generating minimal pathway distances: pathway distance correlates with genome...

26
A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function art Rison 1* , Evangelos Simeonidis 2 , Janet Thornton 1,3 , id Bogle 2 , Lazaros Papageorgiou 2# epartment of Biochemistry and Molecular Biology and epartment of Chemical Engineering, versity College London, London, WC1E 6BT, UK epartment of Crystallography, kbeck College, Malet Street, London, WC1E 7HX, UK rresponding author (biology): [email protected] rresponding author (algorithm): [email protected]

Upload: suzanna-warner

Post on 17-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

A rapid algorithm for generating minimal pathway

distances: Pathway distance correlates with genome distance but

not enzyme function

Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,David Bogle2, Lazaros Papageorgiou2#

1 Department of Biochemistry and Molecular Biology and2 Department of Chemical Engineering,University College London, London, WC1E 6BT, UK3 Department of Crystallography,Birkbeck College, Malet Street, London, WC1E 7HX, UK

* Corresponding author (biology): [email protected]# Corresponding author (algorithm): [email protected]

Page 2: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Outline

• What is pathway distance?• Why calculate pathway distance?• Original method• Novel method - mathematical

programming• Application:

– Genomic distance– Enzyme function

Page 3: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

The shortest pathway distance between GltA and Mdh is 8 steps (considering directionality) or 2 steps (without

directionality)

Each metabolic transition represents a pathway distance unit (step)

Pathway distance considers distance between metabolic enzymes

Should take into account:• directionality• circularity

The pathway distance between GapA and GltA is 7 steps

This step is

reversible

This step is irreversible

(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)

Glycolysis+

TCA Pathway Distance

Page 4: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Pathway Distance

• Reverses the “usual” pathway representation (substrates as nodes, enzymes as edges)

• Pathway distance is inclusive; the source enzyme has a distance of 1 step

Page 5: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Why calculate pathway distance?

• Metabolic pathways are complex networks of interaction enzymes, substrates and co-factors

• Relatively well characterised for certain organisms (e.g. E. coli )

• Much work done on modelling metabolism but now also much interest in pathways as an indicator of “connectivity” between genes

• Pathway distance (Dp) is an extension of this connectivity

Page 6: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Original Method

• Represent pathways as directed acyclic graphs

• Use arbitrary direction for pathways• “Snip” open any cycle• Perform DFT of resulting graphs• Collect set of genes at distances

2,3,…,n along resulting traversals

Page 7: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Glycolysis+

TCA

(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)

Original Method

Original EcoCyc pathways include:• Directionality• Cycles

Dictate directionality:• Arbitrarily set direction (top to bottom, clockwise)

mdh

gltA

“Snip” cycles

Page 8: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Pathway Distance Algorithm

• For each metabolic pathway– For each enzyme in the pathway

• Find the minimal distances from the source enzyme to all other enzymes by solving linear programming problems of the type:

Maximise Summation_of_Enzyme_Distancessubject toEnzyme_Connectivity_Constraints

• Post processing “calculations” are integrated in the algorithm (e.g. genome distance or enzyme function conservation)

Page 9: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

For each node i* (source)

Maximise Di

i

subject to: Dj Di + 1, (i,j): Lij = 1

0 Di T, i

Di* = 1SETS– i,j: nodesPARAMETERS

– Lij:1 if there is a link from i to j, 0 otherwise

– T: large numberCONTINUOUS VARIABLES

– Di: Distance of node i from source node

i j

Algorithm - objective function and constraints

Page 10: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

i* AMax DA+DB+DC+DD

s.t.DA = 1

DA DB+1

DB DA+1

DC DB+1

DC DD+1

DD DC+1

DD DB+1

A

B

C

D

A

B

C

D

1

2

3

3

Algorithm - Inequalities

Page 11: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Key Features of Algorithm

• Hierarchical solution procedure • Based on linear programming

techniques • Using an enzyme-node network

representation

Page 12: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Advantages of Algorithm

• Efficiency in tackling– pathway circularity– reaction directionality

• Modest computational times• Implementation within GAMS

software system

Page 13: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Metabolic pathways

• We encoded 68 E. coli small molecule metabolism (SMM) pathways, these pathways were derived from EcoCyc

• This represents a set of 594 enzymes

• Pathway distances ranged from 2 to 15

Page 14: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Pathway Distance and Genome Distance

• Calculate minimal pathway distances for all gene pairs in each pathway

• For the same pairs, calculate the base pair separation of the genes encoding the enzymes in the E. coli genome (Dg)

• Plot percentage of gene pairs within a certain genome distance against pathway distance

Page 15: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Shorter genomic distances are more likely at smaller pathway distances

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

16.00%

18.00%

20.00%

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pathw ay Distance

Cu

mu

lati

ve p

erce

nta

ges

<100bp <1000bp <10000bp <100000bp

Page 16: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Genome Distance - Conclusions• Strong correlation between Dp and

Dg

• Genes with small Dp tend to have shorter Dg

• Genes involved in nearby metabolic reactions are genomically clustered

Page 17: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Pathway Distance and Function• Calculate minimal pathway

distances for all gene pairs in each pathway

• Compare the EC numbers assigned to the genes in each pair

1.2.1.12 12. enzymespecific

2. acts on aldehydeor oxo group

1. NAD/NADP asacceptor1. oxidoreductase

1.2.1.121.2.1.20

1.2.1.122.2.1.20

L3 cons

No cons

e.g. G-3-P dehydrogenase

Page 18: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Pathway distance and EC number conservation

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

0 2 4 6 8 10 12 14 16

Pathway Distance

Per

cent

age

of

pai

rs a

t p

athw

ay d

ista

nce

None Level 1 Levels 1+2 Level 1+2+3 All levels

Page 19: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Function - Conclusions

• No observable correlation between pathway distance and function (as represented by EC number)

• Enzymatic chemistries are varied along the conversion from one substrate to the next and aren’t performed in ‘blocks’ of similar catalysis

Page 20: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Conclusions - Algorithm

• We have an effective, correct and rapid algorithm to calculate metabolic distance

• The Dp metric can be usefully used as a measure protein functional relation

Page 21: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Conclusions - Biology

• As expect pathway distance correlates with genome distance

• Pathway distance does not correlate with function as determined by EC number

Page 22: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Acknowledgements

• Sarah Teichmann, University College London

• Peter Karp, SRI international, Melno Park, CA

• Monica Riley, Alida Pellegrini-Toole, Marine Biological Laboratory, Woods Hole, MA

Page 23: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

A rapid algorithm for generating minimal pathway

distances: Pathway distance correlates with genome distance but

not enzyme function

Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,David Bogle2, Lazaros Papageorgiou2#

1 Department of Biochemistry and Molecular Biology and2 Department of Chemical Engineering,University College London, London, WC1E 6BT, UK3 Department of Crystallography,Birkbeck College, Malet Street, London, WC1E 7HX, UK

* Corresponding author (biology): [email protected]# Corresponding author (algorithm): [email protected]

Page 24: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

All distances

0%

20%

40%

60%

80%

100%

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pathw ay distance

Cu

mu

lati

ve p

erce

nta

ges

100 1000 10000 100000 1000000 10000000

Page 25: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

Pathway distance and EC number conservation

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pathway distance

Cu

mu

lati

ve p

erce

ntag

e o

f p

airs

None Level 1 Levels 1+2 Levels 1+2+3 All levels

Page 26: A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos

• i* ADA = 1

DA DB+1

DB DA+1

DC DB+1

DC DD+1

DD DC+1

DD DB+1

DE DD+1

DE DF+1

DF DC+1

DF DE+1

A

B

E

C

D

F

A

B

E

C

D

F

1

2

3

4

3

4