part of two input structures are provided, as cifs or a ... · matchbox is a software...
TRANSCRIPT
Our algorithm builds on an expansion *1 of the classic Ullmannalgorithm *2. By the nature of Ullmann’s method, connectivity ismaintained during structure matching, i.e. there must be thesame bonding configuration between the matched atoms inone structure and their counterpart mapped atoms in the otherstructure. Additionally, we allow tailoring of this graph comparisonto successfully match only ‘graph identical’ environments(i.e. same number of bonds), or ‘graph similar’ environments wherethe definition of similarity is arbitrary to the implementation.
C
O
O C
Ovs
Graph Similarity
Chemical Similarity
C
H
C
F
vs
We can tailor which elements will match with which in a similarmanner to tailoring our graph matching behaviour. By specifyingthat only ‘chemical identical’ candidates and environments shouldbe considered, we limit mapping candidates to those atom pairswhere both atoms - one from each input structure - have the sameelement type. Alternatively, specifying ‘chemical similar’ for candidatesand environments, along with text-described sets of ‘similar’ environments, allows us to consider equivalent those environmentswith similar pharmacological e�ects, for example.
Arbitrary Mappings
C
H
H
H
C
H
H
Hvs
Our algorithm purposefully does not consider any spatial information during itssearch, and instead relies solely on connectivity to match components of our twostructures. One consequence of this is that entirely equivalent matches will begenerated between the environments of two matched atoms, if one environmentcontains more than one atom which is chemically identical or similar to the atoms inthe first environment. We handle these so-called ‘arbitrary mappings’ by clusteringthe arbitrary atoms in each structure, and associating a cluster in one structure withthe relevant cluster in the other; we then know that all mappings where each atomin one cluster maps to one of the atoms in the other cluster are valid.
Feature Detection
CC
C
C
C
C
CC
C
CC
C
vs
Certain moieties which exhibit high local symmetry may prolong chemical structurematching, since many mapping permutations may be possible between two suchequivalent moieties in the input structures. One related problem is that of ‘ringshu�ing’ - a linear carbon chain in one structure will repeatedly match with a carbonring (e.g. benzene) in the other structure, shu�ing round the ring by one atom togenerate each new match. Our implementation is currently able to rapidly detect allrings present in the input structures., as well as any other features speci�ed by theuser. This allows more e�cient structure matching by avoiding complications such asring shu�ing, and guarantees that any features found in one structure will only matchwith an equivalent feature in the other structure. It also guarantees that a feature iseither matched in its entirety, or else not matched at all.
Part of Age Concern, a joint project between
N David Brown
James Haestier
Mustapha Sadki
David J Watkin
Amber L Thompson
..the Oxford team!
Oxford and Durham Universities
matchbOx is a software implementation of anew chemical structure matching application.Two input structures are provided, as CIFs orSMILES strings, and after matching has occurred,they are rendered inside a 3D model viewer.Fragments of equivalent connectivity and elementalcomposition within the two structures are colourfullyidenti�ed, allowing the chemist to rapidly recognisesimilarities between the two structures.
When viewing matches, hovering over a colouredatom in one structure with the mouse will highlightit, along with its matched counterpart in the otherstructure. Match descriptive text tables can haveentries selected with a click, and the relevantmatched atom pair will be highlighted in the displays.
Since SMILES strings contain no positional data,a force directed layout strategy is utilised toprovide a realistic 3D con�guration for therepresented structures.
Through an intuitive graphical user interface, users caneasily tailor the algorithm’s behaviour according to their needs.Elements which should be considered similar may be speci�ed,and the relationship between matched atom bond countscan be de�ned.
Results are clearly speci�ed in concise text reports.
Match records are displayed in aseparate viewing window, andallow the same mouse interactionas matchbOx, both with the3D structure renders and with thematch descriptive tables.
Match results are communicated to a server,where a continuously running match servicecalled databOx stores records in a database.The service provides a web interface whichallows navigation of database entries.
References:
1. L.P. Cordella, P. Foggia, C. Sansone, M. Vento (2004). IEEE Trans. On Patt. Anal. & Mach. Intell., 26, 10, 1367-1372
2. J.R. Ullmann (1976). Journal of the Association for Computing Machinery, 23, 31-42 .