inferring chemical reaction patterns using rule ...inferring chemical reaction patterns using rule...

15
Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1 , Christoph Flamm 2 , Daniel Merkle 1 , Peter F. Stadler 2-7 1 Department for Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark 2 Institute for Theoretical Chemistry, University of Vienna, W¨ ahringerstraße 17, A-1090 Wien, Austria. 3 Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, H¨ artelstraße 16-18, D-04107, Leipzig, Germany. 4 Max Planck Institute for Mathematics in the Sciences, Inselstraße 22 D-04103 Leipzig, Germany. 5 Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. 6 Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønneg˚ ardsvej 3, DK-1870 Frederiksberg C, Denmark. 7 Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA Email: [email protected]; CF * :[email protected]; DM * :[email protected]; PFS:[email protected]; * Corresponding author Abstract Background: Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations is a natural and convenient approach to modeling chemistry. Graph grammar rules are most naturally employed to model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, in particular in the analysis of larger systems, to summarize several subsequent reactions into a single composite chemical reaction. Results: We introduce a generic approach for composing graph grammar rules to define a chemically useful rule compositions. We iteratively apply these rule compositions to elementary transformations in order to automatically infer complex transformation patterns. As an application we automatically derive the overall reaction pattern of the Formose cycle, namely two carbonyl groups that can react with a bound glycolaldehyde to a second glycolaldehyde. Rule composition also can be used to study polymerization reactions as well as more complicated iterative reaction schemes. Terpenes and the polyketides, for instance, form two naturally occurring classes of compounds of utmost pharmaceutical interest that can be understood as “generalized polymers” consisting of five-carbon (isoprene) and two-carbon units, respectively. Conclusion: The framework of graph transformations provides a valuable set of tools to generate and investigate large networks of chemical networks. Within this formalism, rule composition is a canonical technique to obtain coarse-grained representations that reflect, in a natural way, “effective” reactions that are obtained by lumping together specific combinations of elementary reactions. 1

Upload: others

Post on 16-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

Inferring Chemical Reaction Patterns Using RuleComposition in Graph Grammars

Jakob Lykke Andersen1, Christoph Flamm2, Daniel Merkle1, Peter F. Stadler2−7

1 Department for Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M,

Denmark 2 Institute for Theoretical Chemistry, University of Vienna, Wahringerstraße 17, A-1090 Wien, Austria. 3 Bioinformatics

Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Hartelstraße 16-18, D-04107, Leipzig,

Germany. 4 Max Planck Institute for Mathematics in the Sciences, Inselstraße 22 D-04103 Leipzig, Germany. 5 Fraunhofer Institute

for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. 6 Center for non-coding RNA in Technology and

Health, University of Copenhagen, Grønnegardsvej 3, DK-1870 Frederiksberg C, Denmark. 7 Santa Fe Institute, 1399 Hyde Park

Rd, Santa Fe, NM 87501, USA

Email: [email protected]; CF∗:[email protected]; DM∗:[email protected]; PFS:[email protected];

∗Corresponding author

Abstract

Background: Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations isa natural and convenient approach to modeling chemistry. Graph grammar rules are most naturally employedto model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, inparticular in the analysis of larger systems, to summarize several subsequent reactions into a single compositechemical reaction.Results: We introduce a generic approach for composing graph grammar rules to define a chemically useful rulecompositions. We iteratively apply these rule compositions to elementary transformations in order to automaticallyinfer complex transformation patterns. As an application we automatically derive the overall reaction pattern of theFormose cycle, namely two carbonyl groups that can react with a bound glycolaldehyde to a second glycolaldehyde.Rule composition also can be used to study polymerization reactions as well as more complicated iterative reactionschemes. Terpenes and the polyketides, for instance, form two naturally occurring classes of compounds of utmostpharmaceutical interest that can be understood as “generalized polymers” consisting of five-carbon (isoprene) andtwo-carbon units, respectively.Conclusion: The framework of graph transformations provides a valuable set of tools to generate and investigatelarge networks of chemical networks. Within this formalism, rule composition is a canonical technique to obtaincoarse-grained representations that reflect, in a natural way, “effective” reactions that are obtained by lumpingtogether specific combinations of elementary reactions.

1

Page 2: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

IntroductionDirected hypergraphs [1] are a suitable topologi-cal representation of (bio)chemical reaction networkswhere (catalytic) reactions are hyperedges connect-ing substrate nodes to product nodes. Such net-works require an underlying Artificial Chemistry [2]that describes how molecules and reactions are mod-eled. If molecules are treated as edge and vertex la-beled graphs, where the vertex labels correspond toatom types and the edge labels denote bond types,then structural change of molecules during chemicalreactions can be modeled as graph rewrite [3]. Incontrast to many other Artificial Chemistries thisapproach allows for respecting fundamental rulesof chemical transformations like mass conservation,atomic types, and cyclic shifts of electron pairs inreactions. In general, a graph rewrite (rule) trans-forms a set of substrate graphs into a set of productgraphs. Hence the graph rewrite formalism allowsnot only to delimit an entire chemical universe inan abstract but compact form but also provides amethodology for its explicit construction.

Most methods for the analysis of this networkstructure are directed towards this graph (or hy-pergraph) structure [1, 4], which is described bythe stoichiometric matrix S of the chemical system.Since S is essentially the incidence matrix of thedirected hypergraph, algebraic approaches such asMetabolic Flux Analysis and Flux Balance Analy-sis [5] have a natural interpretation in terms of thehypergraph. Indeed typical results are sets of pos-sibly weighted reactions (i.e., hyperedges) such aselementary flux modes [6], extreme pathways [7],minimal metabolic behaviors [8] or a collection ofreactions that maximize the production of a desiredproduct in metabolic engineering. The net reactionof a given pathway is simply the linear combinationof the participating hyperedges.

In the setting of generative models of chemistry,each concrete reaction is not only associated withits stoichiometry but also with the transformationrule operating on the molecules that are involvedin a particular reaction. Importantly, these rulesare formulated in terms of reaction mechanisms thatreadily generalize to large sets of structurally relatedmolecules. It is thus of interest to derive not only thestoichiometric net reaction of a pathway but also thecorresponding “effective transformation rule”. In-stead of attempting to address this issue a posteri-ori, we focus here on the possibility of composingthe elementary rules of chemical transformations to

new effective rules that encapsulate entire pathways.The motivation comes from the observation

that string grammars are meaningfully characterizedand understood by investigating the transformationrules. Consider, as a trivial example, the context-free grammar G with the starting symbol S and therules S → aS′a, S′ → aS′a | B and B → ε | bB. In-specting this grammar we see that we can summarizethe effect of the productions as B → bk, k ≥ 0, andS → anBan, n ≥ 1. The language generated fromG is thus {anb∗an|n ≥ 1}. Here we explore whethera similar reasoning, namely the systematic combina-tion of transformation rules, can help to character-ize the language of molecules that is generated bya particular graph rewriting chemistry. Similar tothe example from term rewriting above, we shouldat the very least be able to recognize the regulari-ties in polymerization reactions. We shall see below,however, that the rule based approach holds muchhigher promises.

Chemical reactions can be readily composed to“overall reactions” such as the net transformationof metabolic pathways. This observation is usedimplicitly in flux balance analysis at the level ofthe stoichiometric matrix. Recently, [9] consideredthe composition of concerte chemical reactions, i.e.,transformations of complete molecules, as a meansof reconstructing metabolic pathways. In this con-tribution we take a different point of view: insteadof asking for concrete overall reactions, we are con-cerned with the composition of the underlying reac-tion mechanisms themselves. As we will see, thesecan be applied to arbitary molecule contexts. Wetherefore address two issues: First we establish theformal conditions under which chemical transforma-tion rules can be meaningfully composed. To thisend, we discuss rule composition within the frame-work of concurrency theory in the following section.We then investigate the specific restrictions that ap-ply to chemical systems, leading to a constructiveapproach for inferring composite rules.

The basic computational task we envision startsfrom an unordered set R of reactions such as thoseforming a particular metabolic reaction pathway. Toderive the effective transformation rule describingthe pathway we need to find the correct ordering π inwhich the transformation rules pi, underlying the in-dividual chemical reactions ρi, have to be composed.We illustrate this approach in some detail using theFormose reaction as an example in the Results sec-tion.

2

Page 3: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

Graph Grammars and Rule CompositionGraph grammars, or graph rewriting systems, areproper generalizations of term rewrite systems. Awide variety of formal frameworks have been ex-plored, including several different algebraic onesrooted in category theory. We base our conceptualdevelopments on the double pushout (DPO) formu-lation of graph transformations. For the compre-hensive treatise of this framework we refer to [10].In the following sections we first outline the basicsetup and then introduce full and partial rule com-position. Alternative approaches to graph rewrit-ing in the context of (artificial) chemistry have beenbased on the single pushout (SPO) model of graphtransformations, see e.g. [11, 12]. We briefly discussthe rather technical difference between the DPO andSPO framework in Appendix A, where we also brieflyoutline our reasons for choosing DPO.

Double Pushout and ConcurrencyThe DPO formulation of graph transformations con-siders transformation rules of form p = (L l←− K

r−→R) where L, R, and K are called the left graph, rightgraph, and context graph, respectively. The maps land r are graph morphisms. The rule p transformsG to H, in symbols G

p,m==⇒ H if there is a pushout

graph D and a “matching morphism” m : L → Gsuch that following diagram is valid:

L K R

G D H

l r

ρ λ

m k n (1)

The existence of D is equivalent to the so-called glu-ing condition, which determines whether the rule p isapplicable to a match in G. In the following we willalso write G

p=⇒ H and G⇒ H for derivations, if the

specific match or transformation rule is unimportantor clear from the context.

Concurrency theory provides a canonical frame-work for the composition of two graph transforma-tions. Given two rules pi = (Li

li←− Kiri−→ Ri),

i = 1, 2, a composition (Lql←− K

qr−→ R) = p1 ∗E p2

can be defined whenever a dependency graph E ex-ists so that in the following diagram:

L1 K1 R1 L2 K2 R2

(1) (2)

L C1 E C2 R

(3)

K

l1 r1

s1 t1

ul v1 e1

l2 r2

s2 t2

e2 v2 u2

w1 w2

(2)

the cycles (1) and (2) are pushouts, and (3) is apullback, see e.g., [13]. We then have ql = s1 ◦ w1

and qr = t2 ◦ w2. The concurrency theorem [14]ensures that for any sequence of consecutive directtransformations G

p1,m1===⇒ Hp2,m2===⇒ G′ a graph E,

a corresponding E-concurrent rule p1 ∗E p2, and amorphism m can be found such that G

p1∗Ep2,m======⇒ G′.In order to use graph transformation as a model

for chemical reactions additional conditions must beenforced. Most importantly, atoms are neither cre-ated, nor destroyed, nor transformed to other types.Thus only graph morphisms whose restriction to thevertex sets are bijective are valid in our context. Inparticular, the matching morphism m always corre-sponds to a subgraph isomorphism in our context.The context graph K thus is (isomorphic to) a sub-graph of both L and R, describing the part of Lthat remains unchanged inR. Conservation of atomsmeans that the vertex sets of L, K, and R are linkedby bijections known as the atom-mapping. Whenthe atom mapping is clear, thus, we do not need torepresent the context explicitly.

It is important to note that the existence of thematching morphism m : L → G alone is not suffi-cient to guarantee the applicability of the transfor-mation. In our context, we require in addition thatthe transformation rule does not attempt to intro-duce an edge in R that has been present alreadybefore the transformation is applied. Formally, thegluing condition requires that (l(x), l(y)) /∈ L and(r(x), r(y)) ∈ R implies (m(l(x)),m(r(y))) /∈ G.

Full Rule Composition

In the following we will be concerned only with spe-cial, chemically motivated, types of rule composi-tions. In the simplest case the dependency graphE is isomorphic to R1, later we will also considera more general setting in which E is isomorphic tothe disjoint union of R1 and some connected com-ponents of L2. For the ease of notation from nowon we only refer to a rule composition, and not to acomposition of morphisms as in the Graph Grammarsection, i.e., p1 ∗E p2 will be denoted as p2 ◦ p1 (notethe order of the arguments changes). If E ∼= R1,then L2

∼= e2(L2) is a subgraph of R1. Omitting theexplicit references to the subgraph matching mor-phism e2 we can simply view L2 as subgraph of R1

as illustrated in Figure 1.The rule composition thus amounts to a rewrit-

3

Page 4: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

L1 R1 L2 R2

p1 p2

=⇒

L1 R3 R2

p3

Fig. 1: Full composition of two rules requires that L2 is (iso-morphic to) a subgraph of R1.

ing R1p2,e2===⇒ R, while the left side L1 is preserved.

We will use the notation p2 ◦ p1 and G′p2◦p1===⇒ G′

for this restricted type of rule composition, and callit full composition as the complete left side of p2

is a subgraph of R1. Note that L2 may fit into R1

in more than one way so that there may be morethan one composite rule. Formally, the alternativecompositions are distinguished by different match-ing morphisms e2 in the diagram (2); we will returnto this point below.

Partial Rule CompositionAn important issue for the application to chemicalreactions is that the graphs involved in the rules arein general not connected. Typical chemical reactionscombine molecules, split molecules or transfer groupsof atoms from one molecule to another. The trans-formation rules for all these reactions therefore re-quire multiple connected components. For the pur-pose of dealing with these rules, we introduce thefollowing notation for graphs and derivations.

Let Q be a graph with #Q connected compo-nents Qi, i = 1, . . . ,#Q. It will be convenient totreat Q as the multiset of its components. A typ-ical chemical graph derivation, corresponding to abi-molecular reaction can be written in the form{G1, G2} p,m

==⇒ {H1, H2, H3}, where we take thenotation to imply that all graphs Gi and Hj areconnected. We will furthermore insist that rep-resentations of chemical reactions are minimal inthe following sense: If the left graph of the rulep = (L ← K → R) matches entirely within G1,i.e., m(L) ∩ G2 = ∅, then G2 can be omitted. (Ina chemical rewriting grammar, then, one of theHi must be isomorphic to G2, becoming redun-dant as well.) More formally, we say that a deriva-tion {G1, G2, . . . , G#G} p,m

==⇒ {H1, H2, . . . ,H#H} isproper if

∀i, j : Gi ∼= Hj ⇒ Gi ∩m(L) 6= ∅

That is, a proper derivation cannot be simplified. Ifthe derivation G

p=⇒ H is proper then #G ≤ #L.

The inequality comes from that fact that multiplecomponents of L may easily be matched to a singlecomponent of G while each component of L mustmatch within a component of G.

The conditions for the ◦ composition of rules area bit too strict for our applications. We thus relaxthem respect the component structure of left andright graphs. More precisely, we require that E isisomorphic to a disjoint union of a copy of R1 andsome connected components of L2 so that for ev-ery connected component Li2 of L2 holds that eithere2(Li2) ⊆ e1(R1) or e2(Li2) is a connected compo-nent of E isomorphic to Li2. For a rule compositionof this type to be well defined we need that ∃i suchthat e2(Li2) ⊆ e1(R1) holds. We remark that thelatter condition could be relaxed further to lead toadditional compositions for which left and right sidesare disjoint unions.

The composition of p1 = (L1,K1, R1) and p2 =(L2,K2, R2) now yields p2 ◦ p1 = ({L1, L

22},K3, R3)

(cmp. Figure 2). Note that right graph R3 cannotno longer be regarded simply as a rewritten versionof R1 because rule p2 now adds additional verticesto both the left and the right graph. The compositecontext K3 contains only subsets of K1 and K2, butit is expanded by the vertices of L2

2 and the edges ofL2

2 that remain unchanged under rule p2.An example of a full rule composition is shown in

Fig. 3. The two rules in the example, which in thiscase are also chemical reactions, are part of the For-mose grammar. The Formose grammar consists oftwo pairs of rules. The first pair of rules, (from nowon denoted as p0 and p1), implements both direc-tions of the keto-enol tautomerism. One direction,p1, is visualized in Fig. 3. The second pair, (fromnow on denoted as p2, p3) is the aldol-addition andits reverse respectively. The reverse (p3) is also vi-sualized in Fig. 3. We see that the left side of p1 isisomorphic to a subgraph of one of the componentsof the right side of p3. Composing the two rules bysubgraph matching yields a third rule, p1 ◦ p3.

In general, we require here that the connectedcomponents of R1 and L2 satisfy either e2(Li2) ⊆

4

Page 5: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

L1 R1 L12

p2

L22

R2

p1

=⇒

L1

p3

L22

R3 R2

Fig. 2: Partial rule composition requires that at least one con-nected component (here L1

2) is isomorphic to R1. Additionalcomponents of of the second rule may remain unmatched.

p3

H

O

O

CC

H

C

H

HO

OC

H

C

C

H

p1

HO

C

C H

C

OC

p1 ◦ p3

H

O

O

CC

H

C

H

H

C

O

OC

H

H

C

Fig. 3: Composition of two rules from the Formose reaction (p1 and p3); the following rule names will be used: p0: forwardketo-enol tautomerism (which corresponds to the reverse of p1), p1: backward keto-enol tautomerism, p2: forward aldol addition(which corresponds to the reverse of p3), and p3: backward aldol addition. The atom mapping and matching morphism areimplicitly given in these drawings by corresponding positions of the atoms. The context K thus consists of all the atoms aswell as the chemical bonds (edges) shown in black in both the left and the right graph of each rule.

e1(Rj1) or e1(Rj1) ∩ e2(Li2) = ∅. We furthermore ex-clude the trivial case of parallel rules in which onlythe second alternative is realized. In other extreme,if all components Li2 satisfy e1(Li2) ⊆ e2(R1), thepartial composition becomes a full composition. For-mally, these alternatives are described by differentdependency graphs E and/or different morphismse1 and e2. Pragmatically we can understand this asa matching µ of L2 and R1 as in Fig. 4. Specifying µof course removes the ambiguity from the definitionof the rule composition; hence we write p2 ◦µ p1 toemphasize the matching µ.

Constructing Rule CompositionsGiven two rules, p1 and p2, it is not only interestingto know if a partial composition is defined, but alsoto create the set of all possible compositions

{p2 ◦µ1 p1, p2 ◦µ2 p1, . . . , p2 ◦µkp1}

explicitly. This set in particular contains also all fullcompositions. The following describes an algorithmfor enumerating all partial compositions.

Enumerating the Matchings µ

The key to finding all compositions is the enumer-ation of all matchings µ that respect out restric-tions on overlaps between connected components.We thus start from the sets

{R1

1, R21, . . . , R

#R11

}and{

L12, L

22, . . . , L

#L22

}of connected components of R1

and L2, resp. In the first set we find all subgraphmatches Li2 ⊆ R

j1 (represented as the corresponding

matchings µij) and arrange the result in a matrix oflists of subgraph matches, Fig. 5a.

The matching matrix is extended by a virtualcolumn to account for the possibility that Li2 is notmatched with any component of R1. Every partial(and full) composition is now defined by a selectionof one submatch from each row of the matrix, seeSupplemental Material for an example. The con-verse is not true, however: Not every selection ofmatches correspond to a partial composition. In par-ticular, we exclude the case that only entries fromthe virtual column are selected. In addition, thesub-matches must be disjoint to ensure that the com-bined match is injective. The latter conditions needsto be checked only when more than one submatch is

5

Page 6: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

L2

e2e1

R1 E

µFig. 4: The (partial) composition of two rules is mediated by the dependency graph E andthe two matching morphisms e1 and e2. Since these are subgraph isomorphisms in our case,E is simply the union e1(R1) ∪ e2(L2). The (partial) match e1(R1) ∩ e2(L2) can be under-stood as a matching µ between R1 and L2, i.e., as a 1-1 relation of the matching nodes andedges. Whenever an edge is matched, then so are its incident vertices.

R11 R2

1 R31

L12 1 2

L22 1 1

(a) Match matrix

R11 R2

1 R31 R∅1

L12 1 2 1

L22 1 1 1(b) Extended match matrix

Fig. 5: Example of a match matrix and the samematrix with its virtual extension. The top rowspecifies 1 possibility for L1

2 ⊆ R11 and 2 for

L12 ⊆ R3

1. The extended matrix further specifiesthat L1

2 can be unmatched. The bottom rows canbe interpreted similarly. We display the numberof matchings instead of a representation of thematchings themselves.

selected from the same column.

Composing the Rules

The construction of the composition p2 ◦µ p1 of tworules p1 and p2 does not explicitly depend on thecomponent structure of R2 and L1 because it isuniquely defined by the matching µ and the bijec-tions of the nodes of Li, Ki, and Ri for each of thetwo rules. We obtain L by extending L1 with un-matched components of L2 and R by extending R2

by the unmatched components of R1. The corre-sponding extension of µ to a bijection µ of the ver-tex sets of L and R is uniquely defined. The contextK of the composite rule simply consists the com-mon vertex set of L and R and all edges (x, y) ofL for which (µ(x), µ(y)) is an edge in R. We notein passing that µ defines the atom mapping of thecomposite transformation. The explicit constructionof (R,K,L) is summarized as Algorithm 0.1.

The implementation of the algorithm naturallydepends heavily on the representation of transfor-mation rules, which in our implementation is therepresentation from the Graph Grammar Library(GGL) [15]. The representation is a single graph,with attached vertex and edge properties definingmembership of L, K and R, as well as the neededlabels.

Not all matchings define valid rule composition.For instance, consider an edge (u, v) that is presentin R1 and R2 but not in L2 and both u and v are

in L2. This would amount to creating the edge bymeans of rule p2 which was already introduced by p1.Since we do not allow parallel edges and thus regardsuch inconsistencies as undefined cases and reject thematching. Note that a parallel edge does not corre-spond to a “double bond” (which essentially is onlyan edge with a specific type).

Graph Binding

The composition of transformation rules, andthereby chemical reactions, makes it possible to cre-ate abstract meta-rules in a way that is similar tothe combination of multiple functions into more ab-stract functions in functional programming. A re-lated concept from (functional) programming thatseems useful in the context of graph grammars ispartial function application. Consider, for example,the binding of the number 2 to the exponentiationoperator, yielding either the function f(x) = 2x orf(x) = x2. In the framework of rule composition,we define graph binding as a special case.

Let G be a graph and p2 = (L2,K2, R2) be atransformation rule. The binding of G to p2 resultsin the transformation rule p = (L,K,R) which im-plements the partial application of p2 on G. This isaccomplished simply by regarding G as a rule p1 =(∅, ∅, G), and using partial composition; p = p2 ◦ p1.Note that if p2 ◦ p1 is a full composition, then p canbe regarded as a graph H and G

p2=⇒ H holds.Graph binding allows a simplified representation

6

Page 7: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

Algorithm 0.1: Composing p1 and p2 to p, by a given partial mappingInput: p1 = (L1,K1, R1)Input: p2 = (L2,K2, R2)Input: µ, a partial matching between L2 and R1

Output: p = (L,K,R)1 p← empty rule2 Copy vertices of p1 to p3 foreach vertex v ∈ p2 do4 if v is not mapped by µ then5 Copy v to p6 else7 Change membership in L, K and R for vertex µ(v)8 Copy edges of p1 to p9 foreach Edge e ∈ p2 do

10 if e is not mapped by µ then11 Copy e to p12 else13 Change membership in L, K and R for edge µ(e)14 Delete edges and vertices created by p1, but deleted by p2

15 if matching condition not satisfied then abort16 return p

of reactions. For instance, we can use this for-mal construction to omit uninteresting ubiquitouslypresent molecules such as water by binding the graphof the water molecule to the transformation rule ofa reaction that requires water. Similarly, graph un-binding can be defined as a transformation rule thatdestroys graphs. In a chemical application it can beused to avoid the explicit representation of uninter-esting ubiquitous molecules such as the solvent.

Ordering RulesA wide variety of methods, including flux balanceanalysis, can be used to identify pathways or othersubsets of reactions that are of interest. Adjacencyof reactions in the original networks as well as theirdirectionality can be used efficiently to prune thepossible orders of rule compositions. The fact thatmultiple reactions are instantiations of the sametransformation rule, as in the example discussed indetail in the next section, further reduces the searchspaces.

Results and DiscussionWe illustrate the use of transformation rule com-position by deriving of meta-rules from the graph

grammar consisting of the four rules necessary torepresent the complete Formose reaction, see Fig. 3.The overall reaction pattern of the Formose cycleis 2g0 + g1 → 2g1 with g0 being formaldehyde andg1 being glycolaldehyde. It amounts to the linearcombination

∑9i=1 ρi of the eight reactions and the

influx ρ1 of g0 listed in Fig. 3. It is important to no-tice that several of these reactions are instantiationsof the same, well-known chemical transformations.We have forward keto-enol tautomerism (p0: ρ2, ρ4,ρ6), backward keto-enol tautomerism (p1: ρ7, ρ9),forward aldol addition (p2: ρ3, ρ5), and backwardaldol addition (p3: ρ8). The composite rule modelsthe complete autocatalytic cycle shown in Fig. 6 asa single meta-rule.

Throughout this section we will not explicitly dis-tinguish between partial composition and full com-position, and we interpret the composition operator◦ as right-associative to simplify the notation. Thuspi ◦ pj ◦ pk means pi ◦ (pj ◦ pk).

The rules are used in the autocatalytic cycle inthe following order (starting with an keto-enol tau-tomerisation p0):

p0, p2, p0, p2, p0, p1, p3, p1

As it is not possible to compose this sequence of rulesdirectly, we start by binding glycolaldehyde g1 to

7

Page 8: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

CH2O

3, p2

5, p2

OHO

OHHO

2, p0

1

9, p1

OH

OH

O

OH

OH

HO

4, p0

OH

OH

OH

O

8, p3

OH

OH

OH

HO

7, p1

OH

O

OH

HO

6, p0

OH

C

H

HC

H

O

OH

C

H

C

H

O

H

O

H

C

H

C

H

O

H O C

H

OH

C

CH

O

H

H

O

CH

O C

C

H

O

H

H

O

H

O

C

H

CH

1

2

O C

H

3

O

CH4

O

C

H

O

H

CH

5

Li Ri

O

H

C

CH

O

H

H

O

HO

C

C H

O

H

CH

CH

O

H

H

O

O

C

C H

O

H

C

H

C

H O

H

O

H

O

C CH

O

CH

C

H O

H

O

H

O

CHC

H

H

OC

O

C H

6

O

O

C H

H

O

O

C

7

H

O

C

O

CH

8

O

CH

O

CH

9

Li Ri

Fig. 6: Above: Chemical reaction network for the Formose reaction; hyperedges are labeled with (i, pj) where i is the i-th re-action ρi in the rule composition. pj, 0 ≤ j ≤ 3, refers to a specific rule from the Formose reaction; Right: Resulting composedrule after the composition of the first i rules along the Formose cycle, context shown in black.

reaction p0, as the before-mentioned keto-enol tau-tomerisation is applied to molecule g1. The resultingrule is denoted as g1. The hyperedges in the chemi-cal reaction network depicted in Fig. 6 are numberedaccording to the sequence that reflects in which orderthe Formose reaction takes place and consequentlythe order in which the rule composition subsequentlyis done. The first composition refers to the bindingoperation. This binding of glycolaldehyde results ina graph grammar rule, which is depicted in row 1 inthe table depicted in Fig. 6, i.e., the rule (∅, ∅, g1)(see “Graph Binding”). The numbers at the hyper-edges (2, 3, . . . 9) refer to the second, third, . . ., ninth

reaction in the sequence of reactions given above.The graph grammar rule pi, 0 ≤ i ≤ 3, used for thecorresponding hyper-edge is given next to the se-quence number. The rules inferred by a subsequentrule composition are given in rows 2 to 9 of the table.

The application of the final rule results in thecomposed meta-rule p1 ◦ p3 ◦ . . . ◦ p0 ◦ g1. This ruleprecisely covers the reaction pattern of the Formosereaction, namely how two formaldehyde moleculesand one (bound) glycolaldehyde are transformed totwo glycolaldehyde molecules. However note, thatthe rule is general enough such that any pair ofmolecules with aldehyde groups can be used, i.e., the

8

Page 9: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

inferred reaction pattern refers to a class of overallreactions and the product does not necessarily needto be glycolaldehyde.

The practical computation of these compositionstakes less than a second in the current implemen-tation. Even for substantially more general com-position sequences the running time remains man-ageable. For instance, it takes less than 1 minuteto compute all composition sequences with a lengthk ≤ 10 of the form pi1 ◦ pi2 ◦ · · · ◦ pik ◦ gq withij ∈ {0, 1, 2, 3}, based on the binding of one of theinflux molecules g0 or g1. This results in 1875 differ-ent inferred composite rules.

Polymerization can also be viewed as a pathwayin a chemical reaction network, albeit one of poten-tially infinite size. The same methods applied to theautomatic inference of the overall reaction pattern ofthe Formose cycle can be directly applied to detect-ing composition rules for polymerization reactions.Importantly, even if a chemical reaction network isnot given, the approaches presented in this paper canbe used to automatically find sequences of reactionsthat will lead to polymerization. This can be real-ized by a straight-forward post-processing step: allthat needs to be done is to check whether an inferredcomposite rule exhibits a replicated functional unit.Such polymerization meta-rules also enable the anal-ysis of chemical systems with highly complex carbonskeletons such as the natural compound classes ofthe terpenes or the polyketides.

ConclusionsGraph grammars provide a convenient frameworkfor modeling chemistries on different levels of ab-straction. A chemically valid approach is to see anychemical reaction as a bi-molecular reaction. Thisrequires graph grammar rules that cover changes ofmolecules in an rather explicit and detailed way.Understanding chemical reaction patterns usuallyrequires spanning the chemical reaction networksbased on such rules. Obviously, this approach suf-fers the inherent potential of an immense combina-torial explosion. In this paper we introduced theautomatic inference of such higher-level chemical re-action pattern based on a formal approach for graphgrammar rule combination. We analyzed the auto-catalytic cycle of the Formose reaction and inferredits overall reaction pattern as a rule composition ofnine rules. Rule composition is also naturally ap-

plicable to inferring patterns of polymerization re-actions. Future work will include e.g. the analysisof terpene-based and hydrogen cyanide-based poly-merization chemistry. Many of the enzyme reactionscollected in metabolism databases such as KEGG [16]or MetaCyc [17] are in fact overall reactions of multi-step mechanisms. The enzyme D-alanine transami-nase (EC 2.6.1.21), for instance, achieves its chemi-cal transformation in 12 elementary steps. Generat-ing chemically correct atom-mappings of such overallreactions, a very important step e.g. in the inter-pretation of isotope tracer experiments, is infeasiblewith the currently available methods. In contrast acomposite rule constructed from the individual en-zymatic steps, as found for instance in the MACiEdatabase [18], is guaranteed to yield the chemicallycorrect atom mapping for the overal enzyme reac-tion.

Authors contributionsJ.L.A. implemented the rule composition system.All authors contributed to the theory, the writ-ing of the manuscript and approved the submittedmanuscript.

Acknowledgments

This work was supported in part by the Volkswa-gen Stiftung proj. no. I/82719, the COST-ActionCM0703 “Systems Chemistry”, and the DanishCouncil for Independent Research, Natural Sciences.

References1. Klamt S, Haus UU, Theis F: Hypergraphs and Cellu-

lar Networks. PLoS Comput Biol 2009, 5(5):e1000385.

2. Dittrich P, Ziegler J, Banzhaf W: Artificial chemistries- a review. Artificial life 2001, 7(3):225–275.

3. Benko G, Flamm C, Stadler PF: A graph-based toymodel of chemistry. J Chem Inf Comput Sci 2003,43(4):1085–1093.

4. Aittokallio T, Schwikowski B: Graph-based methodsfor analysing networks in cell biology. Brief Bioin-form 2006, 7(3):243–255.

5. Orth JD, Thiele I, Palsson BØ: What is flux balanceanalysis? Nature Biotech. 2010, 28:245–248.

6. Schuster S, Fell DA, Dandekar T: A general defini-tion of metabolic pathways useful for systematicorganization and analysis of complex metabolicnetworks. Nature Biotech. 2000, 18:326–332.

9

Page 10: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

7. Price ND, Reed JL, Papin JA, Wiback SJ, Palsson BØ:Network-based analysis of metabolic regulationin the human red blood cell. J. Theor. Biol. 2003,225:185–194.

8. Larhlimi A, Bockmayr A: A new constraint-based description of the steady-state flux coneof metabolic networks. Discr. Appl. Math. 2009,157:2257–2266.

9. Felix L, Rossello F, Valiente G: Efficient Recon-struction of Metabolic Pathways by BidirectionalChemical Search. Bull. Math. Biol. 2009, 71:750–769.

10. Ehrig H, Ehrig K, Prange U, Taenthzer G: Fundamentalsof Algebraic Graph Transformation. Berlin, D: Springer-Verlag 2006.

11. Rossello F, Valiente G: Graph grammars and molec-ular biology. Lect. Notes Comp. Sci. 2005, 3393:116–133.

12. Danos V, Feret J, Fontana W, Harmer R, Hayman J,Krivine J, Thompson-Walsh C, Winskel G: Graphs,Rewriting and Pathway Reconstruction for Rule-Based Models. Leibniz International Proceedings in In-formatics 2012. in press.

13. Golas U: Analysis and Correctness of AlgebraicGraph and Model Transformations. Wiesbaden, D:Vieweg+Teubner 2010.

14. Ehrig H, Habel A, Kreowski HJ, Parisi-Presicce F: Par-allelism and Concurrency in High-Level Replace-ment Systems. Math. Struct. Comp. Science 1991,1:361–404.

15. Flamm C, Ullrich A, Ekker H, Mann M, Hogerl D,Rohrschneider M, Sauer S, Scheuermann G, Klemm K,Hofacker I, Stadler PF: Evolution of metabolic net-works: A computational frame-work. Journal ofSystems Chemistry 2010, 1(4).

16. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M:KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012,40:D109–D114.

17. Caspi R, Altman T, Dreher K, Fulcher CA, SubhravetiP, Keseler IM, Kothari A, Krummenacker M, Laten-dresse M, Mueller LA, Ong Q, Paley S, Pujar A,Shearer AG, Travers M, Weerasinghe D, Zhang P, KarpPD: The MetaCyc database of metabolic pathwaysand enzymes and the BioCyc collection of path-way/genome databases. Nucleic Acids Res. 2012,40:D742–D753.

18. Holliday GL, Andreini C, Fischer JD, Rahman SA, Al-monacid DE, Williams ST, Pearson WR: MACiE: ex-ploring the diversity of biochemical reactions. Nu-cleic Acids Research 2012, 40:D783–D789,.

19. Ehrig H, Heckel R, Korff M, Lowe M, Ribeiro L, Wag-ner A, Corradini A: Algebraic Approaches to GraphTransformation Part II: Single Pushout Approachand Comparison with Double Pushout Approach.In Handbook of Graph Grammars and Computing byGraph Transformations. Edited by Rozenberg G, Singa-pore: World Scientific 1997:247–312.

20. Lowe M: Algebraic approach to single-pushoutgraph transformation. Theor. Comp. Sci. 1993,109:181–224.

21. H E: Introduction to the Algebraic Theory ofGraph Grammars. Lect. Notes Comp. Sci. 1979,13:169.

Appendix A: SPO and DPO in ArtificialChemistry ModelsIn the double pushout (DPO) framework a produc-tion L

l←− Kr−→ R is defined by three graphs (the

left graph L, the right graph R, and the contextgraph K) and two graph morphisms l : K → L andr : K → R. In the single pushout (SPO) formal-ism, L

p� R is specified by only two graphs L and

R and a partial graph morphism p. In other words,the production is specified by a total graph mor-phism dom(p) → R, where dom(p) is a subgraph ofL. While DPO lives on the category of graphs andgraph morphisms, SPO is built upon the category ofgraphs and partial graph morphisms, see [19] for adetailed comparison of the two approaches.

There is no real difference between the rulesthemselves since DPO and SPO productions can betranslated into each other: L

p� R translates to

Ll←− dom(p) r−→ R where l is the inclusion of dom(p)

in L and r is the domain restriction of p to dom(p).Conversely, L l←− K r−→ R translates to L

p� R where

the partial morphism p = r ◦ l−1 is well-defined pro-vided l is injective and dom(p) = l(K) [20].

The main difference between the two frameworkslies in the application of the productions, i.e., in theresulting (direct) graph derivations. SPO deriva-tions are complete, that is, for each productionL

p� R and each match m : L m−→ G there is a

derivation Gp,m==⇒ H. In contrast, in DPO the cor-

responding derivation exists if and only if the gluingcondition is satisfied:

(I) There are not distinct elements x, y of L withm(x) = m(y) and y /∈ l(K).

(D) No edge e of G \m(L) is incident to a node inm(L \ l(K)).

SPO is therefore more powerful than DPO in thesense that more general transformations can be im-plemented. On the other hand, the gluing condi-tion ensures that there are no “side effects” suchas dangling edges (which have to be eliminated byconstruction in SPO). These side effects make SPOtransformation more difficult to understand and con-trol in practical applications.

10

Page 11: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

A useful feature of DPO derivations is that pro-ductions are invertible [21]. As in chemical reactions,it suffices to exchange the roles of the left and theright graph, i.e., of products and educts. In contrast,the more general SPO derivations are not invertiblein general.

In applications of graph transformation systemsto modelling chemical reactions further restrictionsare needed to account for the peculiarities of chemi-cal transformation systems:

(i) Reaction rules specify subgraphs. Therefore,the matching morphisms m and n are injec-tive.

(ii) Since atoms are conserved in chemical reac-tions, the restrictions lV and rV to the vertexset are bijections and determine the atom map-ping. Edges model chemical bonds specifiedby electron pairs. These can only be movedaround in the molecules but not collapsed ontothe same bond with the same type. The mor-phisms l and r therefore must be injective.

Lemma 1. Conditions (i) and (ii) imply the gluingcondition.

Proof. Since m is injective, i.e., m(x) = m(y) im-plies x = y, condition (I) is satisfied. Condition (ii)implies that lV (K) is the vertex set of L, i.e., L\l(K)contains no vertices. Thus m(L \ l(K)) is empty sothat condition (D) is trivially satisfied.

As far as models of chemistry are concerned, there-fore, SPO and DPO graph transformations areequivalent.

We prefer to work with the DPO framework forseveral reasons. First, the explicit exposure of thecontext graphK provides a convenient starting pointfor considering transition states e.g. in terms of thepair of subgraphs (L \ l(K), R \ r(K)). In addition,in our experience it is helpful to explicitly constructK in the process of designing rule sets of particulartypes of chemistry such as Diels Alder reactions oraldol condensations appearing in this contribution.Maybe more importantly, the DPO framework ap-pears more convenient when building analysis toolssuch as coarse graining operations into the rewritingsystem. Finally, the framework of (injective) graphmorphisms, in our view, is a more convenient ba-sis for mathematical investigations than the partialgraph morphisms on which SPO is built.

11

Page 12: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

Appendix B: Example of Enumeration of CompositionsIn this Appendix we show the complete result of the composition of two (artificial) rules, p1 and p2, includingthe selection of submatches from the match matrix. The two rules are depicted in Fig. 7 with the extendedmatch matrix of the composition p2 ◦ p1, that corresponds to the example of an extended match matrix asgiven in the paper. The rules in this section are all depicted with vertices that have an additional index. Thenumbering of the components is in increasing order wrt. to these indices, e.g., L1

2 denotes the componentconnecting nodes A, 0 and B, 1 and L2

2 denotes the component connecting nodes B, 2 and C, 3.

Q, 0A, 1

P, 3

B, 2

C, 10

C, 4

R, 6

B, 5

B, 7

A, 8B, 9

Q, 0A, 1

P, 3

B, 2

C, 10

C, 4

R, 6

B, 5

B, 7

A, 8B, 9

(a) p1

A, 0

B, 1

B, 2

C, 3

A, 0

B, 1

B, 2

C, 3

(b) p2

R11 R2

1 R31 R∅1

L12 1 2 1

L22 1 1 1(c) Extended match matrix

Fig. 7: The two rules p1 and p2, and the extended match matrix of the composition p2 ◦ p1. The components of both R1 andL2 are numbered in the same order as the vertex indices.

In the following we will enumerate all valid selections of submatches based on the extended match matrixand give the corresponding resulting rule composition. The chosen matches are depicted as • in the extendedmatch matrix. If several matches can be found (in our example this is true for the component L1

2, that canbe matched twice in R3

1), the • has an index.

Composition 1

R11 R2

1 R31 R∅1

L12 •

L22 •

Q, 0

A, 1 P, 3

B, 2C, 4

R, 6

B, 5

B, 7

C, 10A, 8

B, 9

Q, 0

A, 1 P, 3

B, 2C, 4

R, 6

B, 5

B, 7

C, 10A, 8

B, 9

Result of composition 1

Composition 2

R11 R2

1 R31 R∅1

L12 •1

L22 •

12

Page 13: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

Q, 0

A, 1

P, 3

B, 2

C, 4

R, 6

B, 5

B, 7

C, 10

A, 8

B, 9

Q, 0

A, 1

P, 3

B, 2

C, 4

R, 6

B, 5

B, 7

C, 10

A, 8

B, 9

Result of composition 2

Composition 3

R11 R2

1 R31 R∅1

L12 •2

L22 •

Q, 0A, 1

P, 3

B, 2

C, 4

R, 6

B, 5 B, 7

B, 9

C, 10

A, 8 Q, 0A, 1

P, 3

B, 2

C, 4

R, 6

B, 5 B, 7

B, 9

C, 10

A, 8

Result of composition 3

Composition 4

R11 R2

1 R31 R∅1

L12 •

L22 •

Q, 0A, 1 P, 3

B, 2

C, 10

A, 11

C, 4

R, 6

B, 5

B, 12

B, 7

A, 8B, 9

Q, 0A, 1 P, 3

B, 2

C, 10

A, 11

C, 4

R, 6

B, 5

B, 12

B, 7

A, 8B, 9

Result of composition 4

Composition 5

R11 R2

1 R31 R∅1

L12 •

L22 •

13

Page 14: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

Q, 0A, 1

P, 3

C, 10B, 2

C, 4

R, 6

B, 5

B, 7

A, 8B, 9

Q, 0A, 1

P, 3

C, 10B, 2

C, 4

R, 6

B, 5

B, 7

A, 8B, 9

Result of composition 5

Composition 6

R11 R2

1 R31 R∅1

L12 •1

L22 •

Q, 0A, 1

P, 3

B, 2

C, 10

C, 4

R, 6

B, 5

B, 7

A, 8B, 9

Q, 0A, 1

P, 3

B, 2

C, 10

C, 4

R, 6

B, 5

B, 7

A, 8B, 9

Result of composition 6

Invalid SelectionR1

1 R21 R3

1 R∅1L1

2 •2L2

2 •

This selection of submatches is invalid, as they are not disjoint (node B, 9 would be matched twice).

Composition 7

R11 R2

1 R31 R∅1

L12 •

L22 •

Q, 0

A, 1

P, 3

B, 2

C, 4

R, 6

B, 5

B, 7

C, 10

A, 8

B, 9

B, 12

A, 11

Q, 0

A, 1

P, 3

B, 2

C, 4

R, 6

B, 5

B, 7

C, 10

A, 8

B, 9

B, 12

A, 11

Result of composition 7

14

Page 15: Inferring Chemical Reaction Patterns Using Rule ...Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars Jakob Lykke Andersen 1, Christoph Flamm 2, Daniel Merkle

Composition 8

R11 R2

1 R31 R∅1

L12 •

L22 •

Q, 0

A, 1

P, 3

B, 2

C, 10

C, 4

R, 6 B, 5

B, 7A, 8

B, 9

B, 11

C, 12

Q, 0

A, 1

P, 3

B, 2

C, 10

C, 4

R, 6 B, 5

B, 7A, 8

B, 9

B, 11

C, 12

Result of composition 8

Composition 9

R11 R2

1 R31 R∅1

L12 •1

L22 •

Q, 0

A, 1

P, 3B, 2

B, 5C, 4

R, 6

B, 7

C, 10

A, 8

B, 9

C, 12

B, 11

Q, 0

A, 1

P, 3B, 2

B, 5C, 4

R, 6

B, 7

C, 10

A, 8

B, 9

C, 12

B, 11

Result of composition 9

Composition 10

R11 R2

1 R31 R∅1

L12 •2

L22 •

Q, 0A, 1

P, 3B, 2

B, 5C, 4

R, 6

B, 7

C, 10

A, 8

B, 9

C, 12

B, 11Q, 0

A, 1

P, 3B, 2

B, 5C, 4

R, 6

B, 7

C, 10

A, 8

B, 9

C, 12

B, 11

Result of composition 10

15