motion planning algorithms for molecular simulations: a...

Motion Planning Algorithms for Molecular Simulations:

A Survey

Ibrahim Al-Bluwia,b, Thierry Simeona,b, Juan Cortesa,b

aCNRS, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, FrancebUniv de Toulouse, LAAS, F-31400 Toulouse, France

Abstract

Motion planning is a fundamental problem in robotics that has motivatedresearch since more than three decades ago. A large variety of algorithmshave been proposed to compute feasible motions of multi-body systems inconstrained workspaces. In recent years, some of these algorithms have sur-passed the frontiers of robotics, finding applications in other domains such asindustrial manufacturing, computer animation and computational structuralbiology. This paper concerns the latter domain, providing a survey on motionplanning algorithms applied to molecular modeling and simulation. Both thealgorithmic and application sides are discussed, as well as the different issuesto be taken into consideration when extending robot motion planning algo-rithms to deal with molecules. From an algorithmic perspective, the papergives a general overview of the different extensions to sampling-based mo-tion planners. From the point of view of applications, the survey deals withproblems involving protein folding and conformational transitions, as well asprotein-ligand interactions.

Keywords: Motion Planning, Sampling-based Algorithms, MolecularSimulations, Protein Flexibility, Protein Folding, Protein-LigandInteractions.

1. Introduction

Computer simulations are widely used nowadays to model biomolecules,mimic their behavior and gain insights about their physicochemical proper-ties and biological functions. Indeed, a whole field exists dedicated to suchsimulations currently exists under the name of computational structural bi-ology.

Preprint submitted to Computer Science Reviews July 27, 2012

Computational methods have been mostly developed to complement ex-perimental methods. For instance, molecular dynamics (MD) [1] and MonteCarlo (MC) methods [2] are largely used to study thermodynamic prop-erties and the activity of proteins from an initial structure determined byX-ray crystallography [3] or nuclear magnetic resonance (NMR) [4]. Thecomplementarity between experimental and computational methods can alsobe exploited in the other direction, since simulations can be enhanced usingexperimental data. An interesting illustration is the use of NMR chemicalshifts to restrain MD simulations [5].

Some computational methods go further, aiming to replace experimentalmethods. For instance, computational methods can be used to determine thestructure of proteins without prior experimental information [6]. Methodsare also available for predicting molecular interactions (molecular docking)[7], and for understanding how proteins move from random coils to their na-tive structure (protein folding) [8]. Nevertheless, the current status of thesecomputational methods is still far from providing completely accurate andreliable results in all cases, and many problems remain out of reach for state-of-the-art methods. For example, current computational power permits MDsimulations of up to some microseconds. This is of course insufficient sincesome processes involving protein motions can occur over the range of sec-onds. This is the case for protein folding [9], for instance. MC methods alsosuffer from shortcomings in their search and sampling of the conformationalspace of proteins, which is a rugged landscape with many local minima. MCmethods tend to get trapped in local minima that are separated by highenergy barriers.

For these reasons, active research is currently focused on enhancing sim-ulation techniques (see [10, 11, 12, 13] for examples) and producing alter-natives for them. This paper surveys a particular family of such alternativemethods that are inspired from the field of robot motion planning. Robotics-inspired methods have been introduced recently for simulating motions ofproteins and for studying problems like protein folding and protein-ligandinteractions. They borrow ideas from sampling-based motion planning algo-rithms [14, 15, 16], which have been proven to be powerful tools for tacklinghigh-dimensional robot motion planning problems.

Although the two fields of robotics and molecular simulations seem verydistant at first glance, a closer look reveals many similarities regarding theformulation of the tackled problems. In an early survey [17], Parsons andCanny have shown that several of the problems studied in the field of compu-

2

tational structural biology are actually geometric problems that have coun-terparts in the field of robotics. This is mainly due to the fact that motionplays a central role for both robots and proteins. Indeed, molecular motionsare an integral part of the biological processes proteins are involved in, suchas catalysis and signal transmission. Understanding how proteins move isdirectly linked to understanding such processes, as well as to understandingdysfunctions and their contribution to disease, such as mad cow disease andAlzheimer’s disease [18].

Since motion planning inspired algorithms for molecular simulations arerelatively new, up to our knowledge, no dedicated reviews have been writtenon this subject. Nevertheless, there are three works that are noteworthy inthis regard. The first is a survey by Moll et al. [19] that is dedicated toapplications of motion planning roadmap methods to protein folding. Thesecond is an online course prepared by Kavraki entitled “Geometric Meth-ods in Structural Computational Biology” [20]. This course is a good andcomprehensive reference on the broad subject of using geometric methodsin computational biology. It is oriented towards explaining in detail thebackground, algorithms and the implementation details rather than survey-ing the current literature; which is the aim of this paper. The third oneis a very recent survey on computational models of protein kinematics anddynamics [21]. Although there are some overlaps between the referred sur-vey and this paper that concern geometric/kinematic modeling of proteinsand motion-planning-based approaches to explore the protein conformationalspace, the scope of both papers is rather different. The referred paper is fo-cused on the application of robotics-inspired methods together with Markovmodels to obtain a compact representation of the protein conformationalspace, while the present work gives an overview of motion planning algo-rithms applied to different classes of problems in structural biology.

Our goal with this survey is twofold: (1) For readers in the structural biol-ogy community, we expect this paper to serve as an introduction to robotics-inspired methods with applications in their domain, and that our work willcontribute to spreading this new family of methods in this community; (2)For readers in the robotics community, our aim is to incite them to lookat problems in structural biology, which represent a challenging applicationdomain that motivates the development of improved algorithms for accuratecomputations in very-high-dimensional spaces.

The paper is structured as follows: Section 2 begins by introducing thegeneral problem of motion planning and by presenting basic algorithms, espe-

3

cially sampling-based algorithms. The discussion then proceeds by explain-ing the different issues to be taken into account when moving from motionplanning in robotics to performing molecular simulations. Main molecularsimulation methods that are inspired by robot motion planning are then sur-veyed and explained in Section 3. Next, Section 4 discusses the three mainapplication domains in computational structural biology where these algo-rithms have been applied. These application domains are: the analysis ofconformational transitions, protein folding and unfolding, and protein-ligandinteractions. For each of these domains, the general problem is presented andthen results achieved using motion planning inspired techniques are surveyedand discussed. Finally, Section 5 summarizes and concludes the survey.

2. From Robot Motion Planning to Molecular Simulations

This section introduces the motion planning problem and briefly presentssome of the algorithms that have been proposed during the last three decades.More attention is given to the two classes of planning algorithms calledProbabilistic Roadmap (PRM) [22] and Rapidly-Exploring Random Trees(RRT) [23], as robotics-inspired algorithms for molecular simulations mainlyfollow these approaches. The discussion will then proceed to how these al-gorithms can be extended for computing molecular motions.

2.1. Motion Planning in Robotics

The goal of robot motion planning is to decide automatically what mo-tions a robot should execute in order to achieve a task specified by initial andgoal spatial arrangements of physical objects [24]. A frequently used exampleis: Given a piano in a certain room, what motions should be applied to thepiano in order to transfer it from position A to position B without collidingwith any of the room’s furniture. The formalized version of this problem isknown as the Piano Mover’s Problem [25].

Motion planning is generally formulated using the notion of ConfigurationSpace [26]. A configuration q describes the pose of the robot (e.g. the x and ycoordinates of a rigid robot translating in a 2D workspace). The configurationspace C is the set of all possible configurations the robot can take, and thenumber of dimensions of this space equals the number of degrees of freedomof the robot (i.e. the number of parameters needed to describe the poseof the robot). Some regions in the configuration space may be consideredforbidden due to the presence of obstacles or due to other constraints. These

4

regions are usually denoted Cobs and the rest of the space is denoted Cfree.The motion planning problem becomes a search problem in Cfree for pathsthat connect the initial and goal configurations.

Early work focused on complete motion planning algorithms, i.e. algo-rithms that always report a solution if one exists and report failure oth-erwise [27, 24, 14]. An excellent overview of different classes of completemotion planning algorithms can be found in [24] (Chapters 4 to 6). Theproblem with these methods is that they are inapplicable to problems withhigh dimensions or complex constraints. Finding complete solutions to suchproblems is known to be intractable [28, 29]. For this reason, attention hasshifted towards practical motion planning algorithms rather than completeones. Sampling-based motion planners [30, 16, 14] are such type of algo-rithms that have gained a lot of momentum lately. These algorithms tradeoff completeness for the sake of generality, efficiency and simplicity of im-plementation. They guarantee a weaker notion of completeness called proba-bilistic completeness, which means that with enough samples, the probabilityto find an existing solution converges to one [14].

Sampling-based planners sample the configuration space to build a repre-sentative set of configurations instead of an explicit representation of the con-figuration space. Sampling-based planners are often classified into two cat-egories: Roadmap-based planners and Tree-based Planners. Roadmap meth-ods work in two phases: a construction phase, where a graph that covers theconfiguration space is built, and a query phase, where the constructed graphis used to plan the motion between a start and goal configuration. Thesemethods are also called multiple-query methods since the built roadmap canbe used multiple times. Tree-based planners, on the other hand, are usuallysingle-shot methods. A tree is grown from the start configuration by sam-pling the space until a path to the goal configuration is found. Thus, theconstruction of the tree and the search for the path are done at the sametime. The two algorithms described next, PRM [22, 31] and RRT [32, 23],are the most representative methods of each of these main classes. For moreinformation about motion planning methods see [29, 24, 15, 14].

2.1.1. Probabilistic Roadmap

The Probabilistic Roadmap (PRM) algorithm was introduced in the nine-ties [22] and was able then to successfully solve motion planning problemswith higher dimensions than what was achieved before. The basic version ofPRM works by performing the following steps iteratively:

5

Figure 1: An illustration of a simple PRM.

1. A random sample is drawn from the configuration space and is checkedfor collision. If the sample is a valid configuration, it is added to theroadmap as a node.

2. A search is performed to find the nearest neighbors in the roadmap tothe new node.

3. An attempt is made to connect the new node to its neighbors using alocal planner whose definition depends on the constraints imposed bythe problem. If a connection can be established without collision, anew edge is added to the roadmap.

The roadmap is built by repeating the previous steps until a stopping crite-rion is met. Another version of the algorithm that performs sampling andconnections in separate loops is also widely used. The produced graph canthen be searched for paths using any of the conventional graph search algo-rithms such as Dijkstra’s shortest path [33] or the A* [34] algorithms. Thesebasic steps of the PRM have been improved over the years and several vari-ants have appeared (e.g. [35, 36, 37, 38, 31]). However, the general structureof the algorithm remains the same. Figure 1 shows an illustrative exampleof the basic PRM.

2.1.2. Rapidly Exploring Random Tree

The most popular tree-based motion planner is the Rapidly-exploringRandom Tree (RRT) [32, 23]. Rooted at the start configuration, a tree isiteratively constructed in the configuration space until the goal configuration

6

Figure 2: Illustration of a simple RRT at an intermediate stage during its construction.

can be connected to one of its nodes. An interesting feature of the algorithmis that nodes with larger Voronoi regions (i.e. the portion of the space thatis closer to the node than to other nodes of the tree) are more likely to bechosen for expansion, and therefore the tree is pulled towards unexploredareas, spreading rapidly in the configuration space.

The basic version of the RRT works by performing the following stepsiteratively:

1. A random configuration qrand is sampled in the configuration space.

2. The tree is searched for a configuration qnear, which is the nearest nodein the tree to qrand.

3. A new configuration qnew is created by moving a predefined distance dfrom qnear in the direction of qrand using a local planner or an interpo-lation method that depends on the mobile system.

4. If qnew is a valid configuration that falls in Cfree, and if the local pathbetween it and qnear is collision-free, then qnew is added to the tree asa new node and an edge is created between qnew and qnear.

This process is repeated until the goal configuration can be connected tothe tree or a maximum number of iterations is reached. Figure 2 showsan illustrative example of the basic RRT algorithm. Variants of this basicalgorithm appeared later on (e.g. [39, 40, 41, 42]). Moreover, other tree-basedplanners that are not directly based on RRT have also been proposed. Someexamples of such planners are: Expansive Spaces Trees [43], Path-DirectedSubdivision Trees [44] and KPIECE [45].

7

2.2. Needed Extensions For Molecular Simulations

Since the algorithms discussed above have been developed with roboticapplications in mind, they need to be extended or adapted in order to suitthe requirements for simulating molecular motion. Generally speaking, thereare several issues that need to be taken into account before applying suchalgorithms. First, a molecular representation that is suitable for applyingmotion planning algorithms needs to be adopted. Next, appropriate simi-larity measures (i.e. distance metrics) and collision detection methods forproteins need to be used. In addition, specific sampling methods can be re-quired to satisfy structural constraints. Energies of molecular conformationsalso need to be taken into consideration since they determine the probabil-ity of their existence in reality. Furthermore, the very high dimensionalityof problems involving biological macromolecules needs to be faced. Theseissues are discussed in the following along with a quick survey of the relevantliterature.

2.2.1. Molecular Representation

The most straightforward way for representing molecules geometricallyis to list the Cartesian coordinates of all the atoms [46, 47]. Bonds canthen be constructed automatically using the distances between atoms andthe knowledge about their types. This is called the Cartesian representationand it is used by the Protein Data Bank [48] to describe proteins. Thisrepresentation is also frequent among conventional modeling tools based onMolecular Dynamics or Monte Carlo methods. The problem with such arepresentation is that it does not directly describe the internal degrees offreedom of the molecule.

There are three types of variables that can be considered as internal de-grees of freedom in molecules: bond lengths, bond angles and dihedral angles.A bond length is the distance between two bonded atoms and a bond angleis the angle between two consecutive bonds. The dihedral angle around thebond between atoms Ai−1 and Ai is the angle formed by planes Ai−2-Ai−1-Ai

and Ai−1-Ai-Ai+1. See Figure 3 for an illustration. Although bond lengthsand bond angles vary, their variation is known to be very small at roomtemperature [49]. On the other hand, major conformational changes in themolecule occur due to variations in dihedral angles. For this reason, a widelyadopted assumption is made, called the rigid geometry assumption [50], thatconsiders dihedral angles to be the only degrees of freedom of the molecule.Hence, the conformation of the molecule can be represented as a vector of

8

Figure 3: Parameters defining the relative position of bonded atoms.

only the dihedral angles [46]. This representation is called the internal co-ordinates representation. Figure 4 shows a protein model together with arepresentation of the dihedral angles corresponding to one of its amino acidresidues.

Modeling a protein in internal coordinates is very similar to modeling anarticulated robot. Indeed, modeling conventions applied in robotics can alsobe applied to molecules [51, 52, 53, 54]. Based on the internal coordinatesrepresentation and the rigid geometry assumption, the protein can be lookedat as an articulated mechanism, where bonds correspond to axes of revolutejoints and atom-groups correspond to rigid links in a kinematic chain (formore about kinematic chains see: [55, 56, 57]). Finally it should be notedthat the atom coordinates, which are required for some operations like en-ergy computation and collision detection, can be computed from the internalcoordinates using forward kinematics [58].

2.2.2. Dimensionality Reduction

Although using internal coordinates with the rigid geometry assumptionreduces the number of variables, the number of degrees of freedom required tomodel biological macromolecules such as proteins remains very large. For ex-ample in molecular docking problems (see Section 4.3), ligands typically have3-15 dihedral angles and receptors have in general more than 1000 dihedralangles, which makes the dimension of the combined search space prohibitivelylarge [59]. This problem of high dimensionality is actually one of the majordifficulties to be faced by computational methods in structural biology.

Several strategies have been used to reduce the dimensionality of the stud-ied problems. For example, molecular docking problems have been tackledfor a long time with the assumption that only the ligand is flexible and that

9

54

χ

χ

χχ

χ

21

3

ψφ

Figure 4: The main image shows a protein model in van der Waals representation (sphericatoms). The detail shows one of its constituent amino acid residues and the dihedralangles required to define its conformation.

the receptor protein is rigid [46]. However, since receptors may go throughimportant conformational changes, it has been shown that this assumptionleads to unrealistic solutions [60]. Other works (for e.g. [61, 62, 63]) havemade more realistic assumptions based on prior chemical knowledge of thereceptor protein. Using this knowledge, dihedral angles that contribute mostto the motions of the receptor are identified. These dihedral angles are thenassumed to be flexible and the rest of the receptor to be rigid. The draw-back of such methods is that they are problem-dependent and hard to auto-mate [59]. A more general approach proposed in [64] identifies automaticallywhich parts of the protein can be considered rigid using methods that arebased on rigidity theory [65, 66]. Another strategy to reduce the dimen-sionality of the problem is to assume that secondary structure elements arerigid, and that loops, linkers and side-chains are flexible. This approach, asin [67], reduces the number of variable parameters significantly and allowsconcentrating on important motions of the protein.

A different approach for addressing the problem is to use statistical di-mensionality reduction methods [68, 69] to map the current degrees of free-dom into a lower-dimensional space. These methods usually begin with apreviously-available ensemble of structures for the protein under study, whichare analyzed in order to create a reduced set of degrees of freedom. An ex-

10

ample of such methods is Principal Component Analysis (PCA) [70], whichis commonly used in the analysis of near-equilibrium fluctuations sampled bymolecular dynamics simulations [71, 72, 73]. In spite of the ability of PCAto capture important collective features, it may not be suitable for accu-rately representing large-amplitude molecular motions given that it providesa linear approximation and that molecular motions are generally non-linear.An example of methods that can capture non-linear features is the IsometricFeature Mapping (IsoMap) method [74]. This method produces a low di-mensional space that preserves as much as possible the geodesic distancesbetween the conformations in the original high-dimensional space. Thisrequires the construction of a nearest neighbor graph using a big numberof distance computations, which makes the algorithm suffer when dealingwith large datasets. A scalable version of IsoMap called Scalable IsoMap(ScIMAP) was introduced and applied to protein modeling applications [71].This method was further extended in [75] to be even more efficient by per-forming distance measures in yet another projection on a lower dimensionalEuclidean space.

Normal Mode Analysis [76] has also been used in this regard. It has beenshown that large-amplitude motions in proteins are related to low-frequencynormal modes [77, 78]. Consequently, low-frequency normal modes can beused to predict the direction of large-amplitude motions. In [79], transitionpathways between conformations are computed using an RRT-like algorithmthat explores linear combinations of low-frequency normal modes. An advan-tage of NMA over methods like PCA and IsoMap is that normal modes arecomputed from a single conformation, so that no dataset of conformations isrequired to be available a priori.

2.2.3. Distance Metrics

In molecular simulations, one often needs to measure how much a molec-ular conformation is different from or similar to another conformation. Thisnotion of similarity (or distance) is also essential for most motion planninginspired methods. As explained in Section 2.1, RRT-based methods rely onfinding the most similar conformation to every new random sample. PRMsalso search for local connections between neighbor nodes corresponding tosimilar conformations. This makes the choice of the distance measure criti-cal for the performance of the whole algorithm.

A widely used and straightforward distance measure is the coordinate rootmean squared deviation (cRMSD), which is measured as the square root of

11

the average squared distances between corresponding atoms in two molecules.This distance measure requires the conformations of both molecules to bealigned (superimposed) in order to remove the effect of any translation orrotation of the whole molecule. Examples of distance measures based on thisidea include [80, 81, 82]. Another widely used measure that eliminates theneed to align the conformations is the distance root mean squared deviation(dRMSD). Here, distances are first computed between pairs of atoms of thesame molecular conformation, then the root mean squared deviation is com-puted between these distances and the corresponding distances in the othermolecular conformation. For an example of such a distance metric see [83].

Measuring the root mean squared deviation can also be done using dihe-dral values instead of atom coordinates, which is how robot configurationsare typically compared within motion planning algorithms. Yet, it is impor-tant to note that in molecular simulations we are more interested in distancemeasures that capture structural differences in proportion with their effecton the potential energy of the molecule. Fluctuations in the backbone havegenerally a stronger effect on the energy than fluctuations in side-chains, forexample. This is not the case with RMSD metrics in general, since theygive the same weight to all atom fluctuations regardless of how much thesefluctuations affect the potential energy. For a comparison between differentdistance measures see [84].

Computing distances can be a bottleneck for motion planning algorithms,especially if all-atom measures like dRMSD and cRMSD are used. Hence,several works have resorted to using approximate metrics instead of the exactones. The rationale behind using such metrics is that an exact distance isnot always required for the algorithm as a whole to function well, whichjustifies trading off exactness for the sake of performance gain. Several suchmethods can be found in the literature. One example is the work of Lotanand Schwarzer [85], in which the protein is replaced by a lower dimensionalaveraged version that is used instead of the original one. This is done bysubdividing the protein into n subsequences, each of which is replaced by itscentroid. The authors used Haar Wavelet analysis to justify their metric andshowed that it is highly correlated with the exact metric. Another examplecan be found in [86]. In this work, the conformation of the whole proteinis represented by only three variables that capture the overall topologicaldifferences between conformations. These variables are: the mean atomicdistance to the centroid (ctd), the mean atomic distance to the farthest atomfrom the centroid (fct), and the mean atomic distance from the atom farthest

12

from fct (ftf). An even more simplified metric is used in [87] for the problemof molecular disassembly (see section 4.3), where the degrees of freedom ofthe protein side-chains and the torsions of the ligand are both ignored andonly the reference frame associated with the ligand’s geometric center is usedfor computing the distance.

A general method to devise simplified distance metrics, which could beapplied for molecular simulations, is proposed in [88]. This method projectsthe sampled conformations q to an m dimensional Euclidean space and per-forms the distance measures in that space. The projection is done by firstselecting m pivots from q and then replacing each variable xi in q by a vec-tor of the distances between xi and each of the pivots. Choosing pivots asfar as possible from each other is believed to best preserve the distances ascomputed in the higher-dimensional space.

2.2.4. Collision Detection

Another important problem is the detection of collisions between parts ofthe same molecule and between different interacting molecules. As explainedin Section 2.1, sampling-based algorithms need a collision checker to decideat every step if a new conformation is valid, and to check if two adjacentconformations can be connected by a collision-free path. Collision detection isindeed intensively performed inside these algorithms. Very efficient collisioncheckers tailored for molecular models are therefore necessary for the overallefficiency of the planning algorithms.

Collision detection has been widely studied in the fields of robotics andcomputer graphics [89, 90] and several general-purpose collision detectionpackages are available (e.g. [91, 92, 93]). However, the problem with mostof these methods is that they do not directly address the complex chain-like structure of large molecules such as proteins. This makes such methodsless efficient than what can possibly be achieved, since the number of pairsconsidered for collision in the chain can be significantly reduced by exploitingthe structural properties of the chain (see [94, 95] for some examples of worksthat address the specific problem of collision detection in kinematic chains).

Several algorithms dedicated to chain-like molecular models have beenproposed. The technique described in [96] exploits the topology of the molec-ular (kinematic) chain to avoid testing for self-collision parts that are knownto be rigid. It uses a hierarchical representation of the chain that allows forefficient updates and queries in O(logN) time, and superimposes on top ofthis representation a hierarchy of bounding boxes, which allows for efficient

13

collision detection and distance computation. The algorithm detects self-collisions with a worst-case complexity of O(N4/3). Another algorithm, calledBioCD [97], was specifically designed to be used within sampling-based mo-tion planning algorithms applied to proteins described as kinematic chains.It assumes that only a pre-selected set of the degrees of freedom of the pro-tein can change arbitrarily and the rest are blocked. The algorithm worksby creating a two-level hierarchy that allows it to avoid detecting collisionsbetween atom pairs whose distance does not change from one iteration toanother.

2.2.5. Loop Closure

Loops are portions of proteins that are highly irregular and varied interms of their sequence and structure. They can play important roles incontrolling enzyme activity, and are often found at the interface in protein-protein or protein-DNA/RNA complexes [98]. Sampling such portions of theprotein poses a challenge that requires extra care. Conformations of loopsmust not only satisfy geometric constraints for collision avoidance, but mustalso satisfy what is known as the loop-closure constraint. The two ends of theloop must remain bonded to the rest of the molecule, which greatly restrictsthe space of admissible conformations of the molecular chain. Therefore,defining an appropriate sampling strategy is a prerequisite for any sampling-based exploration method that takes loop flexibility into consideration.

The protein loop closure problem has often been addressed using robotics-inspired methods (e.g. [99, 100]). Note however that most such methods arelimited to 6 degree of freedom, and therefore, extensions are necessary todeal with long loops. In [101], an algorithm called RLG (short for RandomLoop Generator) was proposed for sampling configurations of long loops.The main idea of RLG is to decompose the loop into several parts: a passivechain and one or two active chains. RLG progressively constructs a randomconfiguration for the active chains by alternating sampling between them.This sampling is performed in a way that increases the probability of satis-fying loop closure when finding a configuration for the passive chain, whichis computed by solving inverse kinematics for 6 consecutive bond torsions.In [102], a modification was introduced to RLG for enhancing its efficiency.The idea was to include steric-clash checks during the sampling of the ac-tive chains, rather than only after the complete conformation is generated.In [103], another sampling strategy for protein loops is proposed that worksin a similar manner to RLG. It decomposes the loop into three parts called:

14

front-end F, mid-portion M and back-end B, samples F and B first, andthen uses inverse kinematics to find a conformation for M.

An alternative to the methods above, which apply (semi-)analytical in-verse kinematics, is to use optimization-based inverse kinematics. A notableexample of such methods is the Cyclic Coordinate Descent (CCD) [104].

2.2.6. Energy Computation

As mentioned in Section 2.2.1, there is a high similarity between the rep-resentation of robot configurations and molecular conformations. Yet, thereis a fundamental difference that needs to be taken into account wheneverdealing with molecules, which is the potential energy associated to confor-mations. Each molecular conformation has an energy level that dependson the interactions between its constituent atoms and with the surroundingmolecules (e.g. the solvent). This energy is an indicator of how likely it isfor the molecule to adopt this conformation (conformations with low energyare naturally preferred over conformations with high energy). Hence, theconformational space of the protein is not a binary space with only valid orinvalid conformations, but a continuous space with conformations that aremore or less likely to occur. For many applications, the algorithms must beable to find least energy paths rather than geometrically valid ones. There-fore, sampling-based algorithms need to be adjusted to cope with this byaccepting or rejecting new conformations based on their energy level, andby associating transition probabilities between conformations based on theenergy difference between them.

The energy of a conformation can be computed with high precision usingquantum mechanics [105]; however, it is highly time consuming and can beeven intractable in large molecules, since it deals directly with the electronicstructure of the molecule. Molecular mechanics [106] is usually used to pro-vide approximate energy values of protein conformations. Functions thatcompute energy based on molecular mechanics are usually called molecularforce fields. They take as input the atom positions and evaluate energy basedon different terms that vary from one force field to another. Yet, these termsusually include: changes in bond lengths and bond angles, bond torsions,Van der Waals interactions and electrostatic interactions. The choice of theterms and the shape of the function affect the accuracy of the computation,its speed, and its suitability to some types of molecular systems or applica-tions. See [107, 108] for reviews on force fields and software packages thatare widely used in the study of proteins.

15

The drawback of using such all-atom force fields is that they are stillcomputationally expensive, and thus, their usage can limit the size of thestudied molecules and the time-scale of the performed simulations. This hasmotivated the introduction of coarse-grained force fields [109]. These forcefields measure interactions between blocks of functional groups rather thanbetween the individual atoms. This leads to a rough approximation of theactual force field, but also to a significant performance gain. Some examplesof coarse-grained force fields are MARTINI [110] and OPEP [111].

3. Motion Planning Inspired Methods for Molecular Simulations

A seminal work on the application of motion planning algorithms to thestudy of proteins was published in 1999 [112]. Since that time, many methodsinspired by different motion planning algorithms have appeared and havebeen applied to a variety of molecular simulation problems. Most of thesemethods follow the lines of either PRM or RRT, with PRM-based methodsbeing more oriented towards the computation of ensemble properties andRRT-based methods more towards the computation of feasible paths. Inthis section, we survey literature related to these methods and provide briefexplanations of each of them.

3.1. PRM-Based Methods

3.1.1. Probabilistic Conformational Roadmaps

The method proposed by Singh et al. [112] builds a roadmap by randomlysampling the molecular conformation space. Samples are accepted or rejectedusing a probability function that favors low energy conformations. Thisfeature makes the method different from the conventional PRM in roboticsthat uses collision detection for evaluating new samples. The used probabilityfunction is as follows:

Paccept(q) =

1 if Eq < Emin

Emax−Eq

Emax−Eminif Emin ≤ Eq ≤ Emax

0 if Eq > Emax

(1)

where Eq is the potential energy of conformation q, and Emin and Emax arethreshold values that depend on the molecular system in hand. Neighboringnodes are then connected, and a weight is associated to each edge. Theseweights are probabilities that represent the likelihood of transitions between

16

the connected conformations. For each edge eij, the algorithm generates in-termediate conformations {qi = c0, c1, c2, ..., cn = qj} along the path betweenthe two connected conformations qi and qj. The number of these intermedi-ate conformations is a user-defined parameter. The weight of the edge eij isthen computed by summing the negative logarithm of the transition prob-abilities between each of the consecutive intermediate conformations ci andci+1:

Pi =e−(Ei+1−Ei)/KT

e−(Ei+1−Ei)/KT + e−(Ei−1−Ei)/KT(2)

where Ei is the energy of ci, T is the temperature and K is the Boltzmannconstant. A connectivity-enhancement step is also added to this PRM vari-ant, by sampling extra nodes around nodes that have very few edges.

This method was first introduced for the study of protein-ligand interac-tions, more precisely, to identify potential active sites in the proteins. Theweights of paths entering and leaving low energy nodes were also used toestimate energy barriers around active sites and to distinguish true bindingsites from other low-energy active sites. Later, in [113], this method wasgiven the name of Probabilistic Conformational Roadmaps (PCR), and wasapplied to study protein folding.

3.1.2. Stochastic Roadmap Simulations

Stochastic Roadmap Simulations (SRS) [114, 115, 116, 117, 118] is anevolution of PCR. The main difference between the two methods is foundin the transition probability assigned to edges in the roadmap. SRS uses atransition probability that is consistent with the Metropolis criterion [119,120], which allows to establish a connection between SRS and Monte Carlomethods. The transition probability used in SRS is as follows:

Pij =

{1ni

exp(−4Eij

KT) if 4Eij > 0

1ni

otherwise(3)

Pii = 1−∑j 6=i

Pij (4)

where 4Eij is the difference in potential energy between nodes qi and qj, andni is the number of neighbors to qi. As in equation 2, T is the temperatureand K is the Boltzmann constant. A self-transition edge is added to eachnode such that the sum of transition probabilities for every node is one.

17

Once the roadmap is constructed, tools from Markov Chain Theory (e.g.First Step Analysis) can be applied to study ensemble properties like foldingrates, phi-values and the Transition State Ensemble (see Section 4.2). Ev-ery path in the roadmap can be considered as a run of the Markov ChainMonte Carlo (MCMC) method. This allows to interpret the whole roadmapas the result of a set of MCMC explorations being run simultaneously. Infact in [115], SRS is shown to converge at the limit to the same sampling dis-tribution as that of MCMC. The difference between MCMC and SRS is thatMCMC provides a single but fine-grained path, whereas SRS provides manycoarse-grained paths covering a wider area of the conformational space. Thisis of course a tradeoff, since although SRS covers a wider area of the space ina relatively short time and overcomes the local minima problem inherent toMCMC, coarse granularity comes with the cost of possibly loosing importantinformation along the paths between nodes.

3.1.3. PRMs for Folding Pathways

Another early research direction is the work led by Nancy Amato [121,122, 123, 124, 125, 126, 64, 127, 128]. The PRM-based algorithms proposedby this group to study protein (un-)folding are largely inspired by the PCRmethod. The method builds a roadmap by sampling the conformationalspace of the protein with a probability function that is similar to that ofPCR (see equation 1). New samples are first checked for collisions betweenatoms and then accepted or rejected based on the probability function. Inthis function, Emin is suggested to be set to the potential energy of theextended chain and Emax to be twice Emin [128]. This method also assignsweights to edges in order to find the most likely paths. The equation tocompute these weights is exactly the same than the one used to determinethe move acceptance probability in Monte Carlo methods, usually called theMetropolis criterion [119, 120]:

Pi =

{e−4Ei

KT if 4Ei > 01 otherwise

(5)

where 4Ei = E(ci+1)−E(ci), T is the temperature and K is the Boltzmannconstant.

This method has gone through several evolutions over time. Changesmainly concern the strategy used for sampling new nodes and the methodused to analyze folding pathways. The three main sampling strategies aresummarized in the following:

18

1. In [121, 122], sampling was performed around the native fold (whichis assumed to be known) using a set of normal distributions centeredaround this conformation with various standard deviations. This wasdone to ensure capturing important details close to the native foldusing small standard deviations and to ensure adequate coverage ofthe conformational space using larger standard deviations.

2. In [123, 124] another strategy was proposed since the previous oneworked well only for proteins containing up to 60 residues. The newstrategy also starts from the native fold but generates new confor-mations by iteratively applying small perturbations. Conformationsare partitioned into bins according to the number of native contactspresent. A native contact is defined as a pair of Cα atoms that arewithin 7 A of each other in the native state. At each round, bins witha small number of conformations are chosen and sampling is performedaround them. Newly generated conformations are placed at the appro-priate bins and the loop repeats.

3. The last method based on native contacts was also found to scale poorlybeyond proteins with 100 residues. In [64], another totally differentmethod was proposed for sampling based on rigidity analysis. Here, theprotein is analyzed to identify three types of bonds: rigid bonds, flexiblebonds whose motion does not affect other bonds (called independentlyflexible) and flexible bonds that form a set such that the motion of anyof them affects the rest of the set (called dependently flexible). Themethod perturbs rigid bonds with a low probability denoted Prigid andindependently flexible bonds with a high probability denoted Pflex.For each set of dependently flexible bonds, a number of bonds arechosen randomly and are perturbed with probability Pflex, whereasthe others are perturbed with probability Prigid. This method was ableto characterize the energy landscape more efficiently, with fewer andmore realistic conformations.

Works derived from this method have been proposed more recently byother researchers. An example is the MaxFlux-PRM [129, 130], which uses aslightly different edge weight function in order to find temperature-dependentoptimal reaction paths. In this algorithm, edge weights are computed as afunction of the exponential variation of the energy and the distance betweenconformations.

19

3.2. RRT-Based Methods

3.2.1. Basic RRT variants for computing molecular motions

The first works on the application of RRT to molecular simulations [131,102] were based on a basic variant of the algorithm. The referred paperspresent a two-stage approach. In the first stage, RRT is applied on a mech-anistic representation of the molecular system, only considering geometricconstraints. Paths resulting from the first stage are then analyzed and re-fined in a second stage using a more accurate energy model. The advantageof this two-stage approach is that large-amplitude motions can be computedwith few computational resources. The performance of the method was in-vestigated on several classes of problems involving protein loop motions andprotein-ligand interactions.

A similar approach was proposed in [132] for the simulation of conforma-tional transitions of proteins. The main difference with the aforementionedmethod concerns the validity test performed during the RRT construction,which includes an energy evaluation in addition to the geometric constraints.The authors also proposed a method to cluster paths computed from severalruns of RRT in order to facilitate the analysis performed in a second stage.The technique, based on path alignment, was also used to compute the mostenergetically favorable path in the solution set by combining portions of dif-ferent solutions.

An improvement of the aforementioned RRT-based method, called Path-Rover, was proposed in [133]. In this work, a branch-termination schemeis applied to limit the exploration to a subset of the conformational spacethat satisfies a set of constraints based on prior information. This schemeworks by representing partial information from previous experiments andexpert knowledge as predicates that are checked periodically as the RRTgrows. Branches of the tree that do not improve a certain predicate after mconsecutive iterations are terminated (not extended anymore).

3.2.2. Manhattan-Like RRT: Decoupling degrees of freedom

The Manhattan-Like RRT (ML-RRT) algorithm proposed in [134] wasdeveloped to circumvent the limitations of the basic RRT algorithm to dealwith high-dimensional problems in the particular context of (dis)assemblypath planning. This is a variant of the motion planning problem that consistsof finding a path to (dis)assemble two objects, one of which is considered to bemobile, and the other one to be fixed. In the more general instance addressedhere, both the mobile and the fixed object contain articulated parts. This

20

Figure 5: The image on the left illustrates an academic disassembly planning problem fortwo articulated objects. An analogy can be made with the protein-ligand “disassembly”problem represented in the right-hand image. The red object can be considered as theligand and the blue sticks as flexible side-chains of the protein.

problem resembles the problem of computing access/exit paths for a ligand(small molecule) to/from the active site of a protein (see Figure 5 for anillustration).

The main idea of ML-RRT is to divide configuration/conformation pa-rameters into two groups, called active and passive, and to generate theirmotion in a decoupled manner. Active parameters correspond to parts whosemotions are essential for the disassembly task, whereas passive parameterscorrespond to parts that need to move only if they hinder the motions ofother mobile parts (active or passive). Roughly speaking, motions of activeparts are planned exactly the same way they are planned using RRT, butwhen motion is hindered by a passive part, the conformation of this part isperturbed in order to allocate free space for the motion of active parts. Theperformed perturbation may also cause collisions with other passive parts,which are then perturbed producing a domino-like effect.

The ML-RRT algorithm presents two main advantages when compared tothe basic RRT. First, it is considerably faster, and second, it allows identify-ing automatically (without user intervention or the need of prior knowledge)which parts of the protein need to move in order for the ligand to enter orexit from the active site.

21

The original ML-RRT algorithm is able to solve efficiently problems in-volving the flexibility of the ligand and the protein side chains. The exten-sions proposed in [67] enables the introduction of the protein backbone flex-ibility. In this extension, the protein is represented as groups of rigid bodiesconnected by flexible loops that are assigned based on structural knowledge.Additionally, a mobility coefficient is assigned to each passive parameter.This coefficient is used to differentiate passive parts that are allowed to moveeasily from those that should be moved only if the solution path cannot befound otherwise.

3.2.3. Transition-RRT: Exploring energy landscapes

Another RRT variant called Transition-RRT (T-RRT) was introducedin [135, 136] for exploring energy landscapes. The algorithm introduces astate transition test inspired from the Metropolis criterion in MC methods.The goal is to favor the exploration of low-energy regions. New nodes areaccepted and added to the tree with a probability given by equation 5. Inthis equation, 4Ei is the difference between the energy at the new candidatenode (qnew) and its nearest neighbor in the tree (qnear). In contrast to MCmethods, where the temperature T is usually a constant for the simulation,T-RRT incorporates a reactive scheme to dynamically adapt this parame-ter. To do so, the algorithm keeps track of the number of consecutive treeexpansion rejections. When the T-RRT search reaches a maximum numberof consecutive rejections, the value of T is increased, which increases theprobability to accept subsequent transition tests. In contrast, each time anuphill transition test succeeds, the value of T decreases, therefore increasingthe severity of the transition test. Thus, the temperature is automaticallyregulated during the exploration depending on the shape of the energy land-scape. This temperature regulation strategy is a way to balance the searchbetween unexplored regions and low energy regions. Note that T-RRT doesnot yield a Boltzmann-weighted set of conformations. However, it allows tofind efficiently energy minima and saddle points in the energy landscape, aswell as likely transition paths between stable conformations.

Recently, the underlying principles of ML-RRT and T-RRT have beencombined within an algorithm called MLT-RRT [137]. The combined ap-proach extends the practical applicability of T-RRT to higher-dimensionalproblems in which the energy (or cost) function can be decomposed as a sumof elementary terms associated with subsets of configuration/conformationparameters.

22

3.2.4. NMA-RRT: Exploring collective motions

The work by Kirillova et al. [79] proposes an RRT-based method that ap-plies Normal Mode Analysis (NMA) [76] for computing global macromolec-ular motions. As mentioned in Section 2.2.2, low-frequency normal modesare associated with collective, large-amplitude molecular motions, and canbe used as predictors for the direction of such motions. This property is ex-ploited by the NMA-RRT method, which performs an RRT-like explorationin the coordinate space of the low-frequency normal modes. The goal is tocover the most important areas of the conformational space while exploringa low-dimensional search space. Although, NMA-RRT performs its search ina space that is defined in terms of the amplitudes of low-frequency normalmodes and not in terms of the degrees of freedom of the molecular model,new conformations are accepted only if they satisfy the geometric constraintsof the mechanistic model (i.e. correct bond geometry, collision avoidance).Normal mode calculations are iteratively updated during the conformationalsearch. This is necessary because the information provided by NMA is onlyaccurate in a relatively small region around the initial conformation, whichcauses the guidance of the RRT search to degrade when exploring largerregions.

3.3. Other Methods

In addition to the aforementioned methods, several methods for molecularmodeling and simulation that apply ideas from motion planning algorithmsother than PRM and RRT have been proposed in recent years.

In [86], Shehu et al. proposed a tree-based method called Fragment MonteCarlo Tree Exploration (FeLTr) for protein structure prediction (see Sec-tion 4.2). This method grows a tree in the conformational space that triesto guide the search toward low-energy regions while avoiding oversamplingsimilar conformations. The tree is expanded with low-energy conformationsthrough a fragment-based Monte Carlo sampling strategy. The goal of FeLTris to locate low energy conformations that are potentially close to the pro-tein’s native conformation. These native-like conformations can then act asstarting points for more refined search to obtain the folded conformation.FeLTr uses a coarse-grained representation of the protein and a two-layeredsearch strategy that tries to sample low energy conformations without over-sampling geometrically similar ones.

Another motion-planning-based method was introduced in [138] for com-puting large-amplitude motions between molecular conformations. The meth-

23

od is based on the Path Directed Subdivision Tree (PDST) algorithm [44],which is also a tree-based sampling-based planner, but which represents sam-ples as path segments rather than individual states, and uses non-uniformsubdivisions of the space to estimate coverage [44]. The space subdivision isbased on a distance metric defined in terms of the relative positions betweenthe secondary structure elements. In order to enhance the performance ofthe method, a coarse-grained protein model and a simplified energy functionwere considered.

4. Applications

The methods presented in the previous section have been mainly appliedto three types of problems in computational structural biology: the sim-ulation of conformational transitions of proteins, the study of the proteinfolding process, and the analysis of protein-ligand interactions. This sec-tion discusses briefly each of these problems and presents the main resultsachieved by motion planning inspired methods.

4.1. Conformational Transitions

The most direct application of robot motion planning methods in molecu-lar simulations is the computation of transition pathways between two molec-ular conformations. This problem requires generating a sequence of feasibleintermediate conformations for the molecule (usually a protein) to link twogiven states. The problem is analogous to the motion planning problem inrobotics. This problem can be seen as a general instance of several morespecific problems. In protein folding for example, the starting and end con-formations are the unfolded and folded states of the protein, and in moleculardocking, the starting and end conformations are the undocked and dockedstates of the molecular complex. These two particular problems are treatedin the next subsections. This section concerns transitions between stable(folded) states of proteins.

The study of protein conformational transitions is important since theycan play key roles in molecular recognition and may be essential for theprotein activity. In spite of their importance, current experimental and com-putational methods are very limited for describing large-amplitude confor-mational changes in proteins at the atomic scale.

Finding transition pathways is usually tackled at different levels of gran-ularity depending on the studied problem. Some studies are related to large-

24

a)

b)

Figure 6: Illustration of two classes of large-amplitude motions in proteins. (a) Loopmotions: A segment of the protein (in red) moves significantly, while the rest of theprotein remains mostly static. (b) Domain motions: Large portions of the protein movewith respect to each other.

amplitude motions that occur over a relatively long period of time and thatsignificantly affect the whole protein (such motions are often referred to asdomain motions). In such cases, the problem can be tackled at a structurallevel, with lower resolution than the atom level. In other cases, interestmay be focused on flexible segments of the protein. For example, irregularsegments, called loops and linkers, are generally much more flexible thanstructured parts of the protein (i.e. alpha helices and beta sheets). Thiscalls for exploration methods that are specifically tailored for these flexibleregions. Figure 6 illustrates these two types of protein motions.

4.1.1. Loop Motions

The first application of an RRT-based algorithm extended to treat closedkinematic chains (RLG-RRT) [101] for computing protein loop motions wasdescribed in [131]. The algorithm was applied to study the mobility of loop 7in amylosucrase (AS). This is a long loop involving 17 amino acid residues.

25

The articulated closed-chain model of the loop contains 51 degrees of free-dom. Results showed a possible opening/closing motion of this loop (similarto that of other enzymes), and served to demonstrate the effectiveness ofmotion-planning-based methods for studying the mobility of protein loops.An improved version of the method, which integrates ideas of ML-RRT, wasapplied in [139] to investigate the large-scale open-to-closed movement ofthe lid that controls the access to the active site of Burkholderia cepacia li-pase (BCL). Results showed that the lid conformational transition computedwith this method is comparable to the one obtained with molecular dynamicssimulations. Nevertheless, the computing time required by the RRT-basedmethod is several orders of magnitude lower (few hours on a single processorcompared to weeks on a medium-size cluster).

Several tests on the application of LoopTK to study motions for 20 dif-ferent loops are presented in [103]. Results show that LoopTK can sampleefficiently conformations of loops ranging from 5 to 25 residues in length.Although the combination of LoopTK with sampling-based path-planningalgorithms such as PRM and RRT seems possible, results on the applicationof such a combined strategy to simulate protein loop motions have not beenpublished yet, as far as we know.

4.1.2. Domain Motions

The results reported in [79] show the good performance of NMA-RRT forcomputing transition pathways involving domain motions. A set of five pro-teins for which structures corresponding to different conformations have beenexperimentally solved was used as a benchmark. The abbreviated names ofthese proteins are: ADK, ATP, DAP, EIA and LAO. Further tests on adeny-late kinase (ADK) showed that NMA-RRT produces results that correlatewell with previous studies [140]. Remarkably, NMA-RRT was able to achievethese results using a very low number of normal mode calculations.

Results obtained with PathRover for computing conformational transi-tions of the CesT and the Cyanovirin-N proteins are reported in [133].The particular phenomenon studied in these tests is domain swapping, andthe achieved results were consistent with experimental results. Moreover,in [132], the RRT-based predecessor of PathRover was implemented withina larger framework of algorithms to generate pathways between a closed andan open conformation of the KcsA protein, providing interesting insights intothis process.

Conformational transition simulations have also been performed using

26

Figure 7: A small protein (ubiquitin) in an unfolded state (left) and folded state (right).

the PDST-based method presented in [138] (see Section 3.3). Results arereported for the ADK, RBP, GroEL and CVN proteins. These results showthat the algorithm significantly outperforms a classically used method suchas Simulated Annealing [141]. The paper also shows that results of thePDST-based method are consistent with experimental data.

4.2. Protein Folding

Protein folding is the process in which proteins move (fold) from randomcoils to their native three-dimensional shape. For an illustration, Figure 7represents folded and unfolded conformations of a small protein. Being in thecorrect folded state is essential for proteins to function properly, and, usually,unfolded or incorrectly folded proteins are inactive or even toxic [142, 18].For this reason, it is important to understand and to characterize proteinfolding and unfolding pathways. Note that the study of protein folding shouldbe distinguished from the problem of protein structure prediction [143], inwhich only the final three-dimensional structure of the protein is searched,regardless of how the protein actually reaches it. Nevertheless, both problemsare important, and progress in any of them may yield advances in the other.

Several experimental methods have been used for studying protein fold-ing, such as NMR Spectroscopy [144, 145], Ultrarapid Mixing [146] and Time-Resolved Absorption Spectroscopy [147]. However, these methods are cur-rently limited in their ability to capture short-lived events and to characterizeconformations with a high spatial resolution. Computational methods have

27

been used side by side with these experimental methods, either augmentingthem or even replacing them (for examples, see [148, 10, 149, 150]). Impor-tant advances with these computational methods started with the advent ofthe energy landscape theory [151], which hypothesizes that the energy land-scape of a protein is funneled with many pathways all leading to the samefinal folded state. This suggests that a good understanding and characteriza-tion of the energy landscape of a protein will lead to a good understanding ofhow this protein folds. Hence, motion planning inspired methods for proteinfolding basically take this theory as a basis. The advantage of such methodsover most conventional methods is their ability to rapidly explore the con-formational space without getting trapped in local energy minima, and theircapacity to find several pathways simultaneously.

4.2.1. Computation of Folding Quantifiers

There are several types of quantifiers that are used for studying andexpressing properties of protein folding pathways. These quantifiers can becomputed using experimental methods, which makes them useful also forevaluating the performance of computational methods. Examples of the mostfrequently used quantifiers are:

- The probability of folding (Pfold), which is the probability that thestructure at a certain conformation will become completely folded be-fore it becomes completely unfolded.

- The Transition State Ensemble (TSE), which is the set of conformationswith Pfold = 0.5 (i.e. conformations which make up the energy barrierthe protein must cross in order to fold).

- The folding rate, which corresponds to an experimentally measurablequantity that determines how fast a protein proceeds from the unfoldedstate to the native folded conformation.

- The Φ-value, which measures how close a certain residue is to its nativefolded state.

In [115, 116], Pfold values were computed and compared using SRS andMonte Carlo (MC) for two proteins with PDB IDs 1ROP and 1HDD. Theseproteins were modeled at the secondary structure level with 6 and 12 degreesof freedom respectively. Results showed that SRS computations improverapidly as the roadmap size increases, and that the correlation between SRS

28

and MC computations tends to increase as more MC runs are performedper node. Nevertheless, SRS produced results at least four times faster thanMC. More extensive tests were presented in [117, 118], where 16 proteins wereanalyzed using SRS to compute TSEs, folding rates and Φ-values. Resultswere then compared to those obtained with an existing dynamic program-ming method and were found to better estimate experimental data whencomputing TSEs and folding rates. However, both SRS and the dynamicprogramming method did not produce very good estimates for Φ-values.

PRM-based methods have also been applied to compute folding quanti-fiers together with two new analysis methods called Map-based Master Equa-tion (MME) and Map-based Monte Carlo (MMC). These methods were in-troduced in [126] and used in combination with the conformational explo-ration method presented in Section 3.1.3 to compute relative folding ratesfor proteins G, NuG1 and NuG2. These analysis methods are extensionsto the original Master Equation and Monte Carlo techniques, and they areapplied on the constructed roadmap instead of the full conformational spaceas is conventionally done. The computed relative folding rates were found tomatch the corresponding experimental data.

Finally, the capacity of FeLTr to predict native-like conformations ofsmall-to-medium size proteins has been shown in [86]. Results in this pa-per show a good performance of the method on eight proteins, modeled with40 to 152 degrees of freedom. The conformations provided by FeLTr can beused as starting points for more detailed biophysical studies.

4.2.2. Protein (Un)folding Pathways

Results on the performance of PRM-based methods for studying unfoldingof several proteins with up to 100 residues are reported in [121, 122, 123, 124].The constructed roadmaps were used to extract unfolding pathways and toidentify their secondary structure formation order. The results were foundto be in good agreement with known experimental data. This method wastested on the proteins G and L, as well as on proteins NuG1 and NuG2, whichare two mutants of protein G. Initial tests in [122] were able to capture thefolding differences between proteins G and L, but not between G and NuG1or NuG2. However, these differences were correctly captured after applyingthe rigidity-based sampling strategy in [64].

29

4.2.3. RNA (Un)folding Pathways

The combination of the PRM-based exploration with MME and MMCdiscussed above has also been used in [125, 127] to study the problem ofRNA (un)folding, which is a problem that is very similar to protein folding.Results show that the method scales well for RNA molecules with up to 200nucleotides. This method was used to compute relative folding rates, andwas found to agree with experimental results. It was also able to predict thesame relative gene expression rate for wild-type MS2 phage RNA and threeof its mutants.

4.3. Protein-Ligand Interactions

The study of protein-ligand interactions is essential for understandingmany biological mechanisms. In terms of applications, understanding suchmolecular interactions is essential for drug design in pharmacology, or forprotein engineering in biotechnology. Different questions to be studied are theway the protein recognizes a particular ligand, how the ligand binds with theprotein active site, and what conformational changes both molecules undergoduring the ligand’s entrance and exit. Such information allows to predict thepossibility of association between protein-ligand pairs, the strength of thisassociation, or the protein activity level. Unfortunately, current experimentalmethods to obtain accurate (atomic-scale) information about protein-ligandinteractions are extremely limited. Moreover, the large size of the searchspace to be explored and the long time-scales to be simulated are extremelychallenging for the application of computational methods. This is especiallytrue when full flexibility of the protein is taken into consideration.

Some software packages for predicting protein-ligand docking are availablesuch as AutoDock [152], DOCK [153], FleX [154], GOLD [155] and ICM [156].These packages use algorithms such as Monte Carlo, Molecular Dynamics,Genetic Algorithms [157], and fragment-based search [158] (for a survey ofmethods and software packages see [159]). However, none of these softwaretools considers full flexibility of the protein. Moreover, these methods focuson finding the final binding conformation disregarding the ligand access/exitpathway, and without computing the conformational changes required forenabling such access/exit. An example of such protein-ligand accessibilityproblems is illustrated in Figure 8. Next, we survey works that use motionplanning inspired methods for predicting binding sites and for computingaccess/exit ligand pathways.

30

Figure 8: Illustration of protein-ligand accessibility problem. The figure shows a transver-sal cut of a protein with a ligand (represented with spheric atoms) occupying differentlocations: in the active site (orange) and on the surface (red). Some intermediate con-formations of the ligand along the exit path are represented with red lines, and someside-chains that change their conformation during the ligand exit are represented withblue sticks.

4.3.1. Predicting Binding Sites

The algorithm introduced by Singh et. al. in [112] was tested on the fol-lowing three protein-ligand complexes: Lactate Dehydrogenase with Oxam-ate, tyrosyl-transfer-RNA synthetase with L-leucyl-hydroxylamine and Strep-tavidin with Biotin. The algorithm was able to find the true binding site forthe first two complexes successfully, but not for the third one. Such partialsuccess corresponds to the overall performance of state-of-the-art methods.

More recently, Stochastic Roadmap Simulations have also been used inthe study of protein-ligand interactions. In [114], SRS was applied to estimatethe escape time for a ligand from different putative binding sites in a protein.Here, escape time is the expected amount of time for the ligand to escape fromthe “funnel of attraction” at the binding site [114]. Tests were performed onseven different protein-ligand complexes and results showed that, in five out

31

of seven complexes, escape time proved to be a good metric for distinguishingthe catalytic site from the other putative binding sites. It is noteworthy tosay that in both this work and in [112], only the ligand was assumed to beflexible and the protein was assumed to be rigid. This is possibly one of thereasons why these methods sometimes failed to predict correct binding sites.

4.3.2. Finding Access and Exit Pathways

The RRT-based method presented in [102] was applied to compute ge-ometrically feasible paths of (R, S)-enantiomers to exit the active site ofBurkholderia cepacia lipase (BCL). The flexibility of the ligand and of 17side-chains in the catalytic pocket of BCL were considered. Energy pro-files along the path were obtained by performing a rapid local minimizationof intermediate conformations. Results showed a clear similarity betweenthe computed paths and paths obtained using a pseudo-molecular dynam-ics approach. Remarkably, the combined RRT-minimization approach onlyrequired a few minutes to compute the paths, whereas pseudo-molecular dy-namics took several days. Results also showed that the approach is suitablefor pointing out protein residues that constrain the access of the ligand, whichis a highly valuable information for site directed mutagenesis. Further inves-tigations about the influence of ligand access/exit on Burkholderia cepacialipase enantioselectivity are presented in [160, 161]. These works show theability of RRT-based methods to rapidly produce results that present fairqualitative agreement with experimental studies.

The extended ML-RRT method described in [67], able to deal with theprotein backbone flexibility, was applied to compute the exit pathways of abound substrate homologue (TDG) from Lactose permease (LacY) and ofcarazolol from the active site of the β2-adrenergic receptor. The consideredmolecular models involved several hundreds of degrees of freedom, and solu-tion paths were obtained in several minutes. Results showed a remarkablygood agreement with experimental data, as well as with results obtainedwith other, much more computationally expensive methods based on molec-ular dynamics.

5. Conclusion

We have surveyed the literature for methods based on robot motion plan-ning algorithms to solve different problems in computational structural biol-ogy. The reviewed algorithms can be grouped based on the types of problems

32

Application Domain Related Work

Loop Motions RLG-RRT [131, 102, 139], LoopTK [103].Domain Motions NMA-RRT [79], PathRover [132, 133],

PDST [138].Protein Folding/Unfolding SRS [114, 115, 116, 117, 118], PRM-

FP [121, 122, 123, 124, 126, 64, 128],MaxFlux-PRM [129, 130]

RNA Folding PRM-FP [125, 127].Protein Structure Prediction FeLTr [86].Protein-Ligand Interactions PCR [112, 113], SRS [114], ML-RRT [160,

161].

Table 1: Motion planning inspired methods classified according to application domains.

they have been applied to as shown in Table 1. We have also pointed outthe main challenges and issues that need to be taken into account whenextending motion planning methods for molecular simulations. A suitablerepresentation for the molecule needs to be adopted, and an appropriate dis-tance metric needs to be used for comparing molecular conformations. Anefficient method for computing distances between atom pairs and for colli-sion checking also needs to be considered, as well as a method for samplingconformations that satisfy structural constraints. Moreover, the ever-lastingproblem of high dimensionality has to be faced, and an appropriate compro-mise should be made between the number of considered degrees of freedomand the amount of accuracy sought. Last but not least, energy needs to bemade into account, and a choice has to be taken for the type of force field tobe used.

Works reviewed in this paper show that algorithms originating fromrobotics are promising complementary methods to more conventional tech-niques in computational structural biology. Their strength lies mainly intheir efficiency in exploring highly complex spaces. Compared to classicalmethods such as MC, sampling-based motion planning algorithms requirefewer iterations to find conformational transition pathways or to obtain arepresentative ensemble of conformational states. An additional advantageof motion planning inspired methods is that they do not require a force-fieldto drive the exploration, unlike MD simulations. Therefore, different types ofdata, including simple geometric models, can be used to constrain or to bias

33

the search. The use of simple models leads to general and fast computationalmethods able to explore large regions of the conformational space. Resultsof such exploration can be further refined and analyzed subsequently usingmore accurate energy models.

Motion planning inspired methods for molecular simulations are still intheir early stage. They require more improvements and validation on largerclasses of systems. Further tests on real application problems, in tandemwith experimental methods, will provide important feedback to improve thecomputational methods. Further work is also needed on the characteriza-tion of the results provided by these algorithms, using concepts of statisticalphysics.

As we have shown in this survey, the classes of structural biology prob-lems to which motion planning inspired methods have been applied are stilllimited, being mainly focused around protein/RNA flexibility and protein-ligand interactions. Nevertheless, we believe that the potential of these meth-ods is larger, and that other applications could be investigated in the future.Examples of other interesting problems in structural biology are the predic-tion of protein-protein interactions and the conformational analysis of largemolecular assemblies.

Acknowledgments

This work has been partially supported by the French National Agencyfor Research (ANR) under project GlucoDesign.

References

[1] D. C. Rapaport, The art of molecular dynamics simulation, AcademicPress, 2007.

[2] D. P. Landau, K. Binder, A guide to Monte Carlo simulations in sta-tistical physics, Cambridge University Press, 2005.

[3] M. M. Woolfson, An introduction to X-ray crystallography, CambridgeUniversity Press, 1997.

[4] J. Cavanagh, W. J. Fairbrother, A. G. Palmer, N. J. Skleton, ProteinNMR spectroscopy: principles and practices, Royal Society of Chem-istry, 2006.

34

[5] P. Robustelli, K. Kohlhoff, A. Cavalli, M. Vendruscolo, Using NMRchemical shifts as structural restraints in molecular dynamics simula-tions of proteins, Structure 18 (8) (2010) 923–933.

[6] R. Bonneau, D. Baker, Ab initio protein structure prediction: progressand prospects, Annual Review of Biophysics and Biomolecular Struc-ture 30 (1) (2001) 173–189.

[7] T. Lengauer, M. Rarey, Computational methods for biomolecular dock-ing, Current Opinion in Structural Biology 6 (3) (1996) 402–406.

[8] R. H. Pain, Mechanisms of protein folding, Frontiers in molecular bi-ology, Oxford University Press, 2000.

[9] V. Munoz, Protein folding, misfolding and aggregation: classicalthemes and novel approaches, RSC biomolecular sciences, Royal So-ciety of Chemistry, 2008.

[10] Y. Sugita, Y. Okamoto, Replica-exchange molecular dynamics methodfor protein folding, Chemical Physics Letters 314 (1-2) (1999) 141–151.

[11] E. Marinari, G. Parisi, Simulated tempering: a new Monte Carloscheme, Europhysics letters 19 (6) (1992) 451–458.

[12] A. Laio, M. Parrinello, Escaping free-energy minima, Proceedings ofthe National Academy of Sciences of the United States of America99 (20) (2002) 12562–12566.

[13] D. E. Shaw, P. Maragakis, K. Lindorff-Larsen, S. Piana, R. O. Dror,M. P. Eastwood, J. A. Bank, J. M. Jumper, J. K. Salmon, Y. Shan,W. Wriggers, Atomic-level characterization of the structural dynamicsof proteins, Science 330 (6002) (2010) 341–346.

[14] S. M. LaValle, Planning algorithms, Cambridge University Press, 2006.

[15] H. Choset, K. M. Lynch, S. Hutchinson, G. Kantor, W. Burgard, L. E.Kavraki, S. Thrun, Principles of robot motion: theory, algorithms,and implementation, Intelligent robotics and autonomous agents, MITPress, 2005.

35

[16] K. I. Tsianos, I. A. Sucan, L. E. Kavraki, Sampling-based robot motionplanning: Towards realistic applications, Computer Science Review 1(2007) 2–11.

[17] D. Parsons, J. F. Canny, Geometric problems in molecular biology androbotics, in: Proceedings of the International Conference on IntelligentSystems for Molecular Biology, 1994, pp. 322–220.

[18] D. J. Selkoe, Folding proteins in fatal ways, Nature 426 (6968) (2003)900–904.

[19] M. Moll, D. Schwarz, L. E. Kavraki, Roadmap Methods for ProteinFolding, Humana Press, 2007.

[20] L. E. Kavraki, Geometric methods in structural computational biology.URL http://cnx.org/content/col10344/1.6/

[21] B. Gipson, D. Hsu, L. E. Kavraki, L. J.-C., Computational models ofprotein kinematics and dynamics: Beyond simulation, Annual Reviewof Analytical Chemistry 5 (2012) 273–291.

[22] L. E. Kavraki, P. Svestka, J.-C. Latombe, M. H. Overmars, Proba-bilistic roadmaps for path planning in high-dimensional configurationspaces, IEEE Transactions on Robotics and Automation 12 (4) (1996)566–580.

[23] S. M. LaValle, J. J. Kuffner, Rapidly-exploring random trees: Progressand prospects, in: Algorithmic and computational robotics: New Di-rections: the fourth Workshop on the Algorithmic Foundations ofRobotics, 2001, pp. 293–308.

[24] J.-C. Latombe, Robot motion planning, Kluwer Academic Publishers,1990.

[25] J. T. Schwartz, M. Sharir, On the piano movers’ problem I. the caseof a two-dimensional rigid polygonal body moving amidst polygonalbarriers, Communications on Pure and Applied Mathematics 36 (3)(1983) 345–398.

[26] T. Lozano-Perez, Spatial planning: A configuration space approach,IEEE Transactions on Computers 32 (2) (1983) 108–120.

36

[27] K. Goldberg, Completeness in robot motion planning, in: Proceedingsof the Workshop on Algorithmic Foundations of Robotics, A. K. Peters,Ltd., Natick, MA, USA, 1995, pp. 419–429.

[28] J. H. Reif, Complexity of the mover’s problem and generalizations,in: Proceedings of the 20th Annual Symposium on Foundations ofComputer Science, IEEE Computer Society, 1979, pp. 421–427.

[29] J. F. Canny, The complexity of robot motion planning, MIT Press,Cambridge, MA, USA, 1988.

[30] S. R. Lindemann, S. M. LaValle, Current issues in sampling-based mo-tion planning, Robotics Research (2005) 36–54.

[31] R. Geraerts, M. Overmars, A comparative study of probabilisticroadmap planners, Algorithmic Foundations of Robotics V (2004) 43–58.

[32] S. M. LaValle, J. J. Kuffner, Randomized kinodynamic planning, TheInternational Journal of Robotics Research 20 (5) (2001) 378–400.

[33] E. W. Dijkstra, A note on two problems in connexion with graphs,Numerische Mathematik 1 (1) (1959) 269–271.

[34] P. E. Hart, N. J. Nilsson, B. Raphael, Correction to ”a formal basis forthe heuristic determination of minimum cost paths”, ACM SIGARTBulletin (37) (1972) 28–29.

[35] N. M. Amato, O. B. Bayazit, L. K. Dale, C. Jones, D. Vallejo, OBPRM:An obstacle-based PRM for 3D workspaces, in: Robotics: The Algo-rithmic Perspective: 1998 Workshop on the Algorithmic Foundationsof Robotics, 1998, pp. 155–168.

[36] T. Simeon, J. P. Laumond, C. Nissoux, Visibility-based probabilisticroadmaps for motion planning, Advanced Robotics 14 (6) (2000) 477–493.

[37] S. A. Wilmarth, N. M. Amato, P. F. Stiller, MAPRM: A probabilisticroadmap planner with sampling on the medial axis of the free space,in: Proceedings of the IEEE International Conference on Robotics andAutomation, Vol. 2, 2002, pp. 1024–1031.

37

[38] G. Sanchez, J.-C. Latombe, A single-query bi-directional probabilisticroadmap planner with lazy collision checking, Robotics Research (2003)403–417.

[39] J. J. Kuffner, S. M. LaValle, RRT-connect: An efficient approach tosingle-query path planning, in: Proceedings of the IEEE InternationalConference on Robotics and Automation, Vol. 2, 2000, pp. 995–1001.

[40] J. Bruce, M. Veloso, Real-time randomized path planning for robotnavigation, in: IEEE/RSJ International Conference on IntelligentRobots and Systems, Vol. 3, 2002, pp. 2383–2388.

[41] P. Cheng, S. M. LaValle, Resolution complete rapidly-exploring ran-dom trees, in: Proceedings of the IEEE International Conference onRobotics and Automation, Vol. 1, 2002, pp. 267–272.

[42] S. Rodriguez, X. Tang, J. M. Lien, N. M. Amato, An obstacle-basedrapidly-exploring random tree, in: Proceedings of the IEEE Interna-tional Conference on Robotics and Automation, 2006, pp. 895–900.

[43] D. Hsu, J.-C. Latombe, R. Motwani, Path planning in expansive config-uration spaces, in: Proceedings of the IEEE International Conferenceon Robotics and Automation, Vol. 3, 1997, pp. 2719–2726.

[44] A. M. Ladd, L. E. Kavraki, Fast Tree-Based Exploration of State Spacefor Robots with Dynamics, Springer, 2005, pp. 297–312.

[45] I. A. Sucan, L. E. Kavraki, Kinodynamic motion planning by interior-exterior cell exploration, Algorithmic Foundation of Robotics VIII 57(2009) 449–464.

[46] A. R. Leach, Molecular Modelling: Principles and Applications, Pear-son Education, 2001.

[47] A. Kolinski, Multiscale Approaches to Protein Modeling, Springer Ver-lag, 2010.

[48] H. M. Berman, T. Battistuz, T. N. Bhat, W. F. Bluhm, P. E. Bourne,K. Burkhardt, Z. Feng, G. L. Gilliland, L. Iype, S. Jain, et al., Theprotein data bank, Acta Crystallographica Section D: Biological Crys-tallography 58 (6) (2002) 899–907.

38

[49] T. Schlick, Molecular modeling and simulation: an interdisciplinaryguide, Vol. 21, Springer Verlag, 2010.

[50] R. A. Scott, H. A. Scheraga, Conformational analysis of macro-molecules. II. the rotational isomeric states of the normal hydrocar-bons, Journal of Chemical Physics 44 (1966) 3054.

[51] D. Manocha, Y. Zhu, W. Wright, Conformational analysis of molecu-lar chains using nano-kinematics, Computer Applications in the Bio-sciences: CABIOS 11 (1) (1995) 71–86.

[52] S. M. LaValle, P. W. Finn, L. E. Kavraki, J.-C. Latombe, A randomizedkinematics-based approach to pharmacophore-constrained conforma-tional search and database screening, Journal of Computational Chem-istry 21 (9) (2000) 731–747.

[53] M. Zhang, L. E. Kavraki, A new method for fast and accurate deriva-tion of molecular conformations, Journal of Chemical Information andComputer Sciences 42 (1) (2002) 64–70.

[54] K. Noonan, D. O’Brien, J. Snoeyink, Probik: Protein backbone motionby inverse kinematics, The International Journal of Robotics Research24 (11) (2005) 971–982.

[55] M. Xie, Fundamentals of Robotics: Linking Perception to Action, Se-ries in Machine Perception and Artificial Intelligence, World ScientificPub., 2003.

[56] J. Angeles, Fundamentals of Robotic Mechanical Systems: Theory,Methods, And Algorithms, Mechanical Engineering Series, Springer,2007.

[57] L. Sciavicco, B. Siciliano, Modelling and Control of Robot Manipula-tors, Advanced Textbooks in Control and Signal Processing, Springer,2001.

[58] M. W. Spong, S. Hutchinson, M. Vidyasagar, Robot modeling andcontrol, John Wiley & Sons, 2006.

[59] M. L. Teodoro, G. N. Phillips Jr, L. E. Kavraki, Molecular docking: Aproblem with thousands of degrees of freedom, in: Proceedings of the

39

IEEE International Conference on Robotics and Automation, Vol. 1,2001, pp. 960–965.

[60] C. N. Cavasotto, A. J. W. Orry, R. A. Abagyan, The challenge ofconsidering receptor flexibility in ligand docking and virtual screening,Current Computer-Aided Drug Design 1 (4) (2005) 423–440.

[61] G. Jones, P. Willett, R. C. Glen, A. R. Leach, R. Taylor, Developmentand validation of a genetic algorithm for flexible docking, Journal ofMolecular Biology 267 (3) (1997) 727–748.

[62] J. Apostolakis, A. Pluckthun, A. Caflisch, Docking small ligands in flex-ible binding sites, Journal of Computational Chemistry 19 (1) (1998)21–37.

[63] Y. Pak, S. Wang, Application of a molecular dynamics simulationmethod with a generalized effective potential to the flexible molecu-lar docking problems, The Journal of Physical Chemistry B 104 (2)(2000) 354–359.

[64] S. Thomas, X. Tang, L. Tapia, N. M. Amato, Simulating protein mo-tions with rigidity analysis, Journal of Computational Biology 14 (6)(2007) 839–855.

[65] M. F. Thorpe, P. M. Duxbury, Rigidity theory and applications,Springer US, 1999.

[66] S. Wells, S. Menor, B. Hespenheide, M. F. Thorpe, Constrained geo-metric simulation of diffusive motion in proteins, Physical Biology 2(2005) 127–136.

[67] J. Cortes, D. T. Le, R. Iehl, T. Simeon, Simulating ligand-inducedconformational changes in proteins using a mechanical disassemblymethod, Physical Chemistry Chemical Physics 12 (29) (2010) 8268–8276.

[68] I. K. Fodor, A survey of dimension reduction techniques, Tech. Rep.UCRL-ID-148494, Lawrence Livermore National Lab (June 2002).

[69] L. J. P. van der Maaten, E. O. Postma, H. J. van den Herik, Dimension-ality reduction: A comparative review, Tech. Rep. TiCC-TR 2009-005,Tilburg University (2009).

40

[70] I. T. Jolliffe, Principal component analysis, Springer Verlag, 2002.

[71] P. Das, M. Moll, H. Stamati, L. E. Kavraki, C. Clementi, Low-dimensional, free-energy landscapes of protein-folding reactions by non-linear dimensionality reduction, Proceedings of the National Academyof Sciences 103 (26) (2006) 9885–9890.

[72] A. Altis, P. H. Nguyen, R. Hegger, G. Stock, Dihedral angle principalcomponent analysis of molecular dynamics simulations, The Journal ofChemical Physics 126 (24) (2007) 244111.

[73] Y. Mu, P. H. Nguyen, G. Stock, Energy landscape of a small pep-tide revealed by dihedral angle principal component analysis, Proteins:Structure, Function, and Bioinformatics 58 (1) (2005) 45–52.

[74] J. B. Tenenbaum, V. Silva, J. C. Langford, A global geometric frame-work for nonlinear dimensionality reduction, Science 290 (5500) (2000)2319–2323.

[75] E. Plaku, H. Stamati, C. Clementi, L. E. Kavraki, Fast and reliableanalysis of molecular motion using proximity relations and dimen-sionality reduction, Proteins: Structure, Function, and Bioinformatics67 (4) (2007) 897–907.

[76] Q. Cui, I. Bahar, Normal mode analysis: theory and applications to bi-ological and chemical systems, Chapman and Hall/CRC mathematicaland computational biology series, Chapman & Hall/CRC, 2006.

[77] K. Hinsen, Analysis of domain motions by approximate normal modecalculations, Proteins: Structure, Function, and Bioinformatics 33 (3)(1998) 417–429.

[78] F. Tama, Y. H. Sanejouand, Conformational change of proteins arisingfrom normal mode calculations, Protein Engineering 14 (1) (2001) 1–6.

[79] S. Kirillova, J. Cortes, A. Stefaniu, T. Simeon, An NMA-guidedpath planning approach for computing large-amplitude conformationalchanges in proteins, Proteins: Structure, Function, and Bioinformatics70 (1) (2008) 131–143.

41

[80] S. T. Rao, M. G. Rossmann, Comparison of super-secondary structuresin proteins, Journal of Molecular Biology 76 (2) (1973) 241–256.

[81] M. G. Rossmann, P. Argos, Exploring structural homology of proteins,Journal of Molecular Biology 105 (1) (1976) 75–95.

[82] A. Falicov, F. E. Cohen, A surface of minimum area metric for thestructural comparison of proteins, Journal of Molecular Biology 258 (5)(1996) 871–892.

[83] L. Holm, C. Sander, et al., Protein structure comparison by alignmentof distance matrices, Journal of Molecular Biology 233 (1) (1993) 123–138.

[84] S. Wallin, J. Farwer, U. Bastolla, Testing similarity measures withcontinuous and discrete protein models, Proteins: Structure, Function,and Bioinformatics 50 (1) (2003) 144–157.

[85] I. Lotan, F. Schwarzer, Approximation of protein structure for fastsimilarity measures, Journal of Computational Biology 11 (2-3) (2004)299–317.

[86] A. Shehu, B. Olson, Guiding the search for native-like protein confor-mations with an ab-initio tree-based exploration, International Journalof Robotics Research 29 (8) (2010) 1106–1127.

[87] J. Cortes, L. Jaillet, T. Simeon, Molecular disassembly with RRT-like algorithms, in: IEEE International Conference on Robotics andAutomation, 2007, pp. 3301–3306.

[88] E. Plaku, H. Stamati, C. Clementi, L. Kavraki, Fast and reliable anal-ysis of molecular motion using proximity relations and dimensionalityreduction, Proteins: Structure, Function, and Bioinformatics 67 (4)(2007) 897–907.

[89] P. Jimenez, F. Thomas, C. Torras, 3D collision detection: a survey,Computers & Graphics 25 (2) (2001) 269–285.

[90] M. C. Lin, D. Manocha, Collision and proximity queries, in: Handbookof Discrete and Computational Geometry, 2003.

42

[91] S. Gottschalk, M. C. Lin, D. Manocha, OBBTree: a hierarchical struc-ture for rapid interference detection, in: Proceedings of the 23rd annualconference on Computer graphics and interactive techniques, ACM,1996, pp. 171–180.

[92] G. van den Bergen, Efficient collision detection of complex deformablemodels using AABB trees, Journal of Graphics Tools 2 (4) (1998) 1–13.

[93] J. D. Cohen, M. C. Lin, D. Manocha, M. Ponamgi, I-COLLIDE: Aninteractive and exact collision detection system for large-scale environ-ments, in: Proceedings of the Symposium on Interactive 3D graphics,ACM, 1995, pp. 189–196.

[94] M. Soss, J. Erickson, M. Overmars, Preprocessing chains for fast di-hedral rotations is hard or even impossible, Computational Geometry26 (3) (2003) 235–246.

[95] P. Agarwal, L. Guibas, A. Nguyen, D. Russel, L. Zhang, Collisiondetection for deforming necklaces, Computational Geometry 28 (2-3)(2004) 137–163.

[96] I. Lotan, F. Schwarzer, D. Halperin, J.-C. Latombe, Efficient mainte-nance and self-collision testing for kinematic chains, in: Proceedings ofthe eighteenth annual symposium on Computational geometry, ACM,2002, pp. 43–52.

[97] V. Ruiz de Angulo, J. Cortes, T. Simeon, BioCD: An efficient algorithmfor self-collision and distance computation between highly articulatedmolecular models., in: Robotics: Science And Systems I, MIT Press,2005, pp. 241–248.

[98] H. Rangwala, G. Karypis, Protein Structure Methods and Algorithms,Vol. 14 of Wiley Series in Bioinformatics: Computational Techniquesand Engineering, John Wiley & Sons, 2010.

[99] E. A. Coutsias, C. Seok, M. P. Jacobson, K. A. Dill, A kinematic viewof loop closure, Journal of Computational Chemistry 25 (4) (2004)510–528.

43

[100] R. Kolodny, L. Guibas, M. Levitt, P. Koehl, Inverse kinematics inbiology: The protein loop closure problem, International Journal ofRobotics Research 24 (2-3) (2005) 151–163.

[101] J. Cortes, T. Simeon, Sampling-based motion planning under kine-matic loop-closure constraints, Algorithmic Foundations of RoboticsVI (2005) 75–90.

[102] J. Cortes, T. Simeon, V. Ruiz de Angulo, D. Guieysse, M. Remaud-Simeon, V. Tran, A path planning approach for computing large-amplitude motions of flexible molecules, Bioinformatics 21 (suppl 1)(2005) i116–i125.

[103] P. Yao, A. Dhanik, N. Marz, R. Propper, C. Kou, G. Liu, H. van denBedem, J.-C. Latombe, I. Halperin-Landsberg, R. B. Altman, Efficientalgorithms to explore conformation spaces of flexible protein loops,IEEE/ACM Transactions on Computational Biology and Bioinformat-ics 5 (2008) 534–545.

[104] A. A. Canutescu, R. L. Dunbrack Jr, Cyclic coordinate descent: Arobotics algorithm for protein loop closure, Protein Science 12 (5)(2003) 963–972.

[105] D. J. Griffiths, Introduction to quantum mechanics, Pearson PrenticeHall, 2005.

[106] U. Burkert, N. L. Allinger, Molecular mechanics, American ChemicalSociety, 1982.

[107] J. W. Ponder, D. A. Case, Force fields for protein simulations, Advancesin Protein Chemistry 66 (2003) 27–85.

[108] A. D. Mackerell Jr, Empirical force fields for biological macromolecules:overview and issues, Journal of Computational Chemistry 25 (13)(2004) 1584–1604.

[109] V. Tozzini, Coarse-grained models for proteins, Current opinion instructural biology 15 (2) (2005) 144–150.

[110] L. Monticelli, S. K. Kandasamy, X. Periole, R. G. Larson, D. P. Tiele-man, S. J. Marrink, The MARTINI coarse-grained force field: extension

44

to proteins, Journal of Chemical Theory and Computation 4 (5) (2008)819–834.

[111] P. Derreumaux, From polypeptide sequences to structures using MonteCarlo simulations and an optimized potential, Journal of ChemicalPhysics 111 (5) (1999) 2301–2310.

[112] A. P. Singh, J.-C. Latombe, D. L. Brutlag, A motion planning approachto flexible ligand binding, in: Proceedings of the Seventh InternationalConference on Intelligent Systems for Molecular Biology, AAAI Press,1999, pp. 252–261.

[113] M. S. Apaydin, A. P. Singh, D. L. Brutlag, J.-C. Latombe, Cap-turing molecular energy landscapes with probabilistic conformationalroadmaps, in: Proceedings of the IEEE International Conference onRobotics and Automation, Vol. 1, 2001, pp. 932–939.

[114] M. S. Apaydin, C. E. Guestrin, C. Varma, D. L. Brutlag, J.-C.Latombe, Stochastic roadmap simulation for the study of ligand-protein interactions, Bioinformatics 18 (Suppl 2) (2002) S18–S26.

[115] M. S. Apaydin, D. L. Brutlag, C. Guestrin, D. Hsu, J.-C. Latombe,C. Varma, Stochastic roadmap simulation: An efficient representationand algorithm for analyzing molecular motion, Journal of Computa-tional Biology 10 (3-4) (2003) 257–281.

[116] M. S. Apaydin, D. L. Brutlag, D. Hsu, J.-C. Latombe, Stochastic con-formational roadmaps for computing ensemble properties of molecularmotion, Algorithmic Foundations of Robotics V (2004) 131–147.

[117] T. H. Chiang, M. Apaydin, D. Brutlag, D. Hsu, J.-C. Latombe, Predict-ing experimental quantities in protein folding kinetics using stochasticroadmap simulation, in: Research in Computational Molecular Biology,Springer, 2006, pp. 410–424.

[118] T. H. Chiang, M. S. Apaydin, D. L. Brutlag, D. Hsu, J.-C. Latombe,Using stochastic roadmap simulation to predict experimental quantitiesin protein folding kinetics: folding rates and phi-values, Journal ofComputational Biology 14 (5) (2007) 578–593.

45

[119] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller,E. Teller, et al., Equation of state calculations by fast computing ma-chines, Journal of Chemical Physics 21 (6) (1953) 1087.

[120] D. Frenkel, B. Smit, Understanding Molecular Simulations: From Al-gorihtms to Applications, Academic Press, 2002.

[121] N. M. Amato, G. Song, Using motion planning to study protein foldingpathways, Journal of Computational Biology 9 (2) (2002) 149–168.

[122] G. Song, S. Thomas, K. A. Dill, J. M. Scholtz, N. M. Amato, A pathplanning-based study of protein folding with a case study of hairpinformation in protein G and L, in: Pacific Symposium on Biocomputing,2003, pp. 240–251.

[123] N. M. Amato, K. A. Dill, G. Song, Using motion planning to mapprotein folding landscapes and analyze folding kinetics of known nativestructures, Journal of Computational Biology 10 (3-4) (2003) 239–255.

[124] S. Thomas, G. Song, N. M. Amato, Protein folding by motion planning,Physical biology 2 (2005) 148–155.

[125] X. Tang, B. Kirkpatrick, S. Thomas, G. Song, N. M. Amato, Using mo-tion planning to study RNA folding kinetics, Journal of ComputationalBiology 12 (6) (2005) 862–881.

[126] L. Tapia, X. Tang, S. Thomas, N. M. Amato, Kinetics analysis methodsfor approximate folding landscapes, Bioinformatics 23 (13) (2007) 539–548.

[127] X. Tang, S. Thomas, L. Tapia, D. P. Giedroc, N. M. Amato, SimulatingRNA folding kinetics on approximated energy landscapes, Journal ofMolecular Biology 381 (4) (2008) 1055–1067.

[128] L. Tapia, S. Thomas, N. M. Amato, A motion planning approach tostudying molecular motions, Communications in Information & Sys-tems 10 (1) (2010) 53–68.

[129] H. Yang, H. Wu, D. Li, L. Han, S. Huo, Temperature-dependent proba-bilistic roadmap algorithm for calculating variationally optimized con-formational transition pathways, J. Chem. Theory Comput 3 (1) (2007)17–25.

46

[130] D. Li, H. Yang, L. Han, S. Huo, Predicting the folding pathway of en-grailed homeodomain with a probabilistic roadmap enhanced reaction-path algorithm, Biophysical Journal 94 (5) (2008) 1622–1629.

[131] J. Cortes, T. Simeon, M. Remaud-Simeon, V. Tran, Geometric algo-rithms for the conformational analysis of long protein loops, Journal ofComputational Chemistry 25 (7) (2004) 956–967.

[132] A. Enosh, B. Raveh, O. Furman-Schueler, D. Halperin, N. Ben-Tal,Generation, comparison, and merging of pathways between protein con-formations: Gating in K-channels, Biophysical Journal 95 (8) (2008)3850–3860.

[133] B. Raveh, A. Enosh, O. Schueler-Furman, D. Halperin, Rapid sam-pling of molecular motions with prior information constraints, PLoSComputational Biology 5 (2) (2009) e1000295.

[134] J. Cortes, L. Jaillet, T. Simeon, Disassembly path planning for complexarticulated objects, IEEE Transactions on Robotics 24 (2) (2008) 475–481.

[135] L. Jaillet, J. Cortes, T. Simeon, Sampling-based path planning onconfiguration-space costmaps, IEEE Transactions on Robotics 26 (4)(2010) 635–646.

[136] L. Jaillet, F. J. Corcho, J. J. Perez, J. Cortes, Randomized tree con-struction algorithm to explore energy landscapes, Journal of Compu-tational Chemistry 32 (16) (2011) 3464–3474.

[137] R. Iehl, J. Cortes, T. Simeon, Costmap planning in high dimensionalconfiguration spaces, in: Proceedings of the IEEE/ASME InternationalConference on Advanced Intelligent Mechatronics, 2012, in press.

[138] N. Haspel, M. Moll, M. Baker, W. Chiu, L. E. Kavraki, Tracing con-formational changes in proteins, BMC Structural Biology 10 (Suppl 1)(2010) S1.

[139] S. Barbe, J. Cortes, T. Simeon, P. Monsan, M. Remaud-Simeon,I. Andre, A mixed molecular modelling - robotics approach to investi-gate lipase large molecular motions, Proteins: Structure, Function andBioinformatics 79 (8) (2011) 2517–2529.

47

[140] P. Maragakis, M. Karplus, Large amplitude conformational change inproteins explored with a plastic network model: adenylate kinase, Jour-nal of Molecular Biology 352 (2005) 807–822.

[141] S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Optimization by simulatedannealing, Science 220 (4598) (1983) 671–680.

[142] C. M. Dobson, Protein folding and misfolding, Nature 426 (6968)(2003) 884–890.

[143] M. J. Zaki, C. Bystroff, Protein structure prediction, Methods in Molec-ular Biology, Humana Press, 2008.

[144] J. Balbach, V. Forge, N. A. J. van Nuland, S. L. Winder, P. J. Hore,C. M. Dobson, Following protein folding in real time using NMR spec-troscopy, Nature Structural & Molecular Biology 2 (10) (1995) 865–870.

[145] H. J. Dyson, P. E. Wright, Unfolded proteins and protein folding stud-ied by NMR, Chem. Rev 104 (8) (2004) 3607–3622.

[146] C. K. Chan, Y. Hu, S. Takahashi, D. L. Rousseau, W. A. Eaton,J. Hofrichter, Submillisecond protein folding kinetics studied by ul-trarapid mixing, Proceedings of the National Academy of Sciences ofthe United States of America 94 (5) (1997) 1779–1784.

[147] C. M. Jones, E. R. Henry, Y. Hu, C. K. Chan, S. D. Luck, A. Bhuyan,H. Roder, J. Hofrichter, W. A. Eaton, Fast events in protein foldinginitiated by nanosecond laser photolysis, Proceedings of the NationalAcademy of Sciences of the United States of America 90 (24) (1993)11860–11864.

[148] R. Unger, J. Moult, Genetic algorithms for protein folding simulations,Journal of Molecular Biology 231 (1) (1993) 75–81.

[149] J. N. Onuchic, P. G. Wolynes, Theory of protein folding, Current Opin-ion in Structural Biology 14 (1) (2004) 70–75.

[150] K. A. Dill, S. B. Ozkan, M. S. Shell, T. R. Weikl, The protein foldingproblem, Annual Review of Biophysics 37 (2008) 289–316.

48

[151] J. D. Bryngelson, J. N. Onuchic, N. D. Socci, P. G. Wolynes, Funnels,pathways, and the energy landscape of protein folding: a synthesis,Proteins: Structure, Function, and Bioinformatics 21 (3) (1995) 167–195.

[152] D. S. Goodsell, G. M. Morris, A. J. Olson, Automated docking of flex-ible ligands: applications of AutoDock, Journal of Molecular Recogni-tion 9 (1) (1996) 1–5.

[153] P. T. Lang, S. R. Brozell, S. Mukherjee, E. F. Pettersen, E. C. Meng,V. Thomas, R. C. Rizzo, D. A. Case, T. L. James, I. D. Kuntz, DOCK6: Combining techniques to model RNA–small molecule complexes,RNA 15 (6) (2009) 1219–1230.

[154] M. Rarey, B. Kramer, T. Lengauer, G. Klebe, A fast flexible dock-ing method using an incremental construction algorithm, Journal ofMolecular Biology 261 (3) (1996) 470–489.

[155] G. Jones, P. Willett, R. C. Glen, A. R. Leach, R. Taylor, Developmentand validation of a genetic algorithm for flexible docking, Journal ofMolecular Biology 267 (3) (1997) 727–748.

[156] R. Abagyan, M. Totrov, D. Kuznetsov, ICM - a new method for proteinmodeling and design: Applications to docking and structure predic-tion from the distorted native conformation, Journal of ComputationalChemistry 15 (5) (1994) 488–506.

[157] D. E. Goldberg, Genetic algorithms in search, optimization, and ma-chine learning, Addison-wesley, 1989.

[158] P. J. Hajduk, J. Greer, A decade of fragment-based drug design: strate-gic advances and lessons learned, Nature Reviews Drug Discovery 6 (3)(2007) 211–219.

[159] S. F. Sousa, P. A. Fernandes, M. J. Ramos, Protein-ligand docking:current status and future challenges, Proteins: Structure, Function,and Bioinformatics 65 (1) (2006) 15–26.

[160] D. Guieysse, J. Cortes, S. Puech-Guenot, S. Barbe, V. Lafaquiere,P. Monsan, T. Simeon, I. Andre, M. Remaud-Simeon, A structure-controlled investigation of lipase enantioselectivity by a path-planningapproach, ChemBioChem 9 (8) (2008) 1308–1317.

49

[161] V. Lafaquiere, S. Barbe, S. Puech-Guenot, D. Guieysse, J. Cortes,P. Monsan, T. Simeon, I. Andre, M. Remaud-Simeon, Control of lipaseenantioselectivity by engineering the substrate binding site and accesschannel, ChemBioChem 10 (17) (2009) 2760–2771.

50

motion planning algorithms for molecular simulations: a...

Documents