table of content · the progress of combinatorial chemistry and high throughput technologies has...

TABLE OF CONTENT

Preface

3

Chapter 1 Quantitative structure – pharmacokinetics relationship (QSPkR) modeling: pitfals and cornerstones

5

Chapter 2 QSPkR modeling of the steady state volume of distribution 44

Chapter 3

QSPkR modeling of drug clearance 72

Chapter 4

QSPkR modeling of plasma protein binding (PPB) 107

3

PREFACE

The progress of combinatorial chemistry and high throughput technologies has led to proposal of extensively growing number of structures with drug-like activities. It has been long recognized that the success of a new drug candidate depends not only on activity and safety, but also on its proper pharmacokinetics. Therefore, the prediction of the absorption – distribution – metabolism – excretion (ADME) at the very early stages of drug development became an integral part of the complex discovery process.

Quantitative structure – pharmacokinetics relationship (QSPkR) modeling established itself as a powerful tool for early ADME characterization. It allows reliable prediction even of virtual molecules thus accelerating the identification of new lead compounds and reducing the expense of drug development. QSPkR represents further extension of Quantitative structure – activity relationship (QSAR), however following its principles it faces more difficulties and challenges.

The present book addresses the use of QSPkR as a predictive and diagnostic tool for ADME optimization. Chapter 1 is focused on the main pitfalls and cornerstones of QSPkR and guides the reader through the main steps in the generation of significant, robust and predictive QSPkR model. Chapters 2, 3 and 4 concern the utility of QSPkR for prediction of key ADME parameters: steady state volume of distribution (Vss), body clearance (CL) and plasma protein binding (PPB). The physical meaning and clinical significance of these parameters are discussed, and the most-popular predictive approaches are reviewed with emphasis on QSPkR methodology. Finaly, personal results for QSPkR modeling of Vss, CL and PPB with respect to the ionization state of the molecules are summarized.

Zvetanka D. Zhivkova

5

Chapter 1

Quantitative structure – pharmacokinetics relationship (QSPkR)

modeling: pitfalls and corner stones

Introduction

Drug discovery and development are expensive and time consuming processes. The implementation of combinatorial chemistry and high throughput technologies led to development of a vast number of structures with drug like activity. Unfortunately, many promising drug candidates fail to become marketing products because of improper pharmacokinetic (PK) behavior [1, 2]. The increased awareness of the importance of PK led to an extensive research focused on prediction of absorption, distribution, metabolism and excretion (ADME) of drug like compounds at very early stages of drug development, before the expensive preclinical and clinical trials. As a result, in a short period of ten years, the failures due to improper ADME decreased considerably from 39% to 10% [3]. The early consideration of ADME is expected to reduce the risk of late-stage failures and to optimize screening and testing by identifying and discontinuing the development of poor drug candidates looking at only promising compounds.

In the last two decades in silico modeling was recognized as one of the most reliable and high throughput strategies for modeling and prediction of human PK properties. It demonstrated comparable predictive ability to the traditional in vivo and in

vitro methodologies, while showing unquestionable advantages. This computational approach allows predictions of various aspects of ADME to be made on the basis of theoretical molecular descriptors, even on virtual compounds, thus reducing the investments in technology resources and time [4]. The progress of methodologies for in

silico consideration of ADME is reviewed in several articles and monographs [1, 2, 4, 5 – 11].

In silico modeling utilizes two general approaches: structure (target) based and data (ligand) based. Structure based methods are suitable for modeling specific interactions between drug molecules and target proteins (enzymes, transporters, etc.). They require knowledge about the 3D structure of both protein and drug. Information for the protein and its binding site(s) can be obtained from crystallographic studies, homology modeling using similar proteins, or pharmacophore modeling [1]. Data based

6

approach are generally used for modeling of more complex PK properties, resulting from several interplaying processes [9].

The most widely used in silico methodology is quantitative structure – pharmacokinetics relationship (QSPkR), an extension of the classical quantitative structure – activity relationship (QSAR). QSAR/QSPkR is based on the similarity principle, i. e. the assumption, that drugs with similar structure are expected to have similar activity/PK characteristics. QSPkR models are empirical equations, developed on a limited number of drugs. They use statistical tools to correlate any PK parameter with a set of molecular descriptors, encoding various aspects of chemical structure. The goal if QSPkR modeling is to establish a trend in the descriptor values which parallels the trend in the PK parameter values.

With years QSPkR became a vital component of drug discovery and development offering the industry a more informed choice of leading substances for synthesis. It established itself as both predictive and diagnostic tool. A high-quality QSPkR model can enable screening of large libraries and prediction of ADME properties of new drug candidates, at the same time providing insight on the major structural features governing PK processes thus directing the design towards compounds with highest probability of having suitable in vivo ADME.

Unfortunately, QSPkR faces much more obstacles compared to its precursor QSAR. The accurate prediction of many ADME parameters remains challenging for several reasons. First, because of the complexity of the underlying physiological processes [5, 7]. One of the fundamental assumptions of QSAR is that all of the compounds used in QSAR model development have the same mechanism of action [12]. This is definitely not fulfilled for ADME processes and the key PK parameters. Drug absorption and distribution may involve various mechanisms: passive or facilitated diffusion, active transport, etc. Bioavailability is crucially dependent on both absorption and metabolism. The volume of distribution is determined by drug binding to both plasma proteins and tissue components. In plasma a drug may be selectively or non-selectively bound to various proteins and different binding sites. Drug clearance may involve a consequence of processes: intake in the clearance organs, secretion in bile, metabolism by several organs, glomerular filtration, renal tubular secretion and reabsorption, etc., more or less influenced by plasma protein binding (PPB). All contemporary QSPkR models are based on the assumption for passive diffusion of the molecules across biological membranes, however active transport facilitated by transport

7

proteins plays significant role in ADME of many drugs. Obviously, the key PK parameters are governed by various processes, each of them with different structural requirement making QSPkR modeling rather complicated task.

ADME modeling is further hindered by the limited number of PK data available in the public domain. While there are huge libraries of data regarding biological activity and toxicity, PK data are generally available only for “successful” drugs, i. e. those already approved as marketable products. Consequently, QSPkRs built on these data tend to be skewed toward drugs with acceptable, although probably non ideal, ADME characteristics. This contradicts the philosophy of ADME: when generating models it is better to have unbiased data sets with uniform coverage of the parameter space [8]. Only about 1% of chemical informatics data are available in the public domain [10]. The lack of extensive and reliable experimental data is an important restrictive factor for development of reliable prediction models.

Another important problem is the quality of the data. In general, the values of the modeled PK parameters are compiled from literature sources. Sometimes they are represented by a single PK study, while very frequently the data vary significantly from report to report as a result of differences in study design, doses, analytical assay methodology, mathematical approach for data analysis, etc. This is in disagreement with one of the basic requirements for successful QSAR all values of the response variable to be obtained in one and the same laboratory, or at least following a standardized protocol [13].

A good number of QSPkR models have been published recently for the key PK parameters: steady state volume of distribution (Vss) [14 – 22], total (CL) and renal (CLR) clearance [23 – 33], PPB [34 – 49]. Most of the models claim good predictive performance, however it is difficult to compare the quality of the models because of the lack of standardized protocol for QSPkR modeling and generally accepted statistical criteria for model evaluation. Furthermore, the published models can hardly be reproduced in other laboratories for prediction of PK parameters of new chemicals because of the incomplete description of model algorithms.

The aim of this review is to discuss the pitfalls and obstacles frequently encountered in QSPkR modeling, and to propose a workflow for development of a significant, robust and predictive QSPkR models for the key PK parameters. It is based

8

on the guidance and recommended procedures for QSAR modeling set in the OECD documents [50, 51], complemented with our experience in the field of QSPkR modeling. Workflow for QSPkR model development

The QSPkR model represents a relationship between any PK parameter (dependent variable, target property, end point variable, outcome, response) and molecular descriptors (independent variables, predictors). The response data can be quantitative (in the case of continuous response) or semi-quantitative (in the case of classification or category response).

The workflow of QSPkR modeling follows several steps (Figure 1): 1. Preparation of the dataset – collection of response data, representation of

molecular structures, and curation of the dataset; 2. Preparation of QSPkR table – computation of molecular descriptors and

selection of the most significant variables for the response; 3. Partitioning of the dataset into training and external test sets; 4. QSPkR model development – generation of statistically significant

relationships between the response and descriptor values; 5. QSPkR model validation – estimation of the goodness of fit, robustness and

predictive power of the model; 6. QSPkR model analysis – interpretation, applicability domain and outliers.

The quality of the QSPkR models depends on the amount and accuracy of

response data, the assumptions made, the choice of significant molecular descriptors, and the reliability of the statistical approaches for variable selection, model development and validation.

9

Figure 1. Workflow of QSPkR modeling of PK parameters

Preparation of the dataset

The dataset for QSPkR modeling consists of a set of molecular structures together with experimental data for the response – the PK parameter under consideration. Two potential sources of errors exist: related to the molecular structures, and related to the response data.

Concerning molecular structures, errors could arise from incorrect chemical names, CAS numbers, or from erroneous structure representation. Any error in the structure could result in false descriptor values which in turn lead to inadequacy of the model. Numerous recent studies have reported for a large rate of structural errors in the medicinal publications [52, 53]. Therefore, a special attention should be given to the correct representation of molecular structures. They can be either drawn in home using proper software, or imported from publicly available databases (Drug Bank, Chemical Book, EMBL-EBI, etc.) in a particular format (MOL, SDL, SMILES, etc.). The second approach is recommended to reduce the risk of errors. There are different types of

10

structure representations supplying different amount of information. In general, QSPkR studies are based on two-dimensional (2D) or three-dimensional (3D) representation. The 2D representation defines the connectivity of the atoms in the molecule in terms of their type and the nature of chemical bonds. It is based either on molecular graphs (topological representation), or on linear notation systems (SMILES, SMART, etc.). 2D representation allows computation of 2D descriptors and development of 2D QSARs. 3D representation considered the molecule as a rigid geometrical object in the space and supplies information about the overall special configuration of atoms [54]. For 3D QSAR as well as for structure based techniques the molecule should be presented in 3D format, and a proper structural optimization should be performed.

Recently, Fourches et al. proposed a standardized data curation strategy including several steps [55]:

- Removal from the database of compounds that cannot be appropriately handled by conventional chemometric tools (e. g. inorganic and organometallic compounds);

- Structural conversion (transformation or removal of mixtures and salts); - Normalization of specific chemotypes; - Ring aromatization; - Treatment of tautomeric forms; - Analysis/removal of duplicates.

The latter is especially important. Rigorous statistical analysis assumes that each compound in the dataset is unique. However, we have come across databases containing one and the same drug twice – under different trade names. To avoid this error, it is recommended always to verify the CAS number of the compounds. Also, it could be kept in mind that by 2D presentation any pair of enantiomers or diastereoisomers will be formally recognized as duplicates. The presence of duplicates can influence model predictivity, therefore they should be analyzed and one of them – removed [53].

Curation of the response data is a much more difficult task. Analyzing the pitfalls in QSAR, Cronin and Schulz emphasized on the importance of information about the origin of the response data [56]. ”In terms of biological activities, such data should ideally be measured by a single protocol, ideally even in the same laboratory and by the same workers. High quality biological data will have lower experimental error associated with them. Biological data should ideally be from well standardized assays, with a clear and unambiguous endpoint.” This demand is hard to be fulfilled in QSPkR studies. In general, PK data are compiled from different literature sources, following

11

various experimental protocols. The quality of the data is always at the mercy of data providers [52]. Incorrect data has been cited as one of the major source of failure of QSPkR models [57]. It is advisable to refer to the original literature (whenever possible) to avoid propagation of errors [58]. A verification of the supplied data using different reports is helpful, however it is rarely possible because of the limited publicly available information. For many drugs there is only a single source, while for other the data vary significantly from report to report. A good quality QSPkR models could be used for curation of databases by identifying compounds for which predictions disagree with experimental observations, and that are likely to be annotated with erroneous end point variable testing results [21].

The dataset should be also adequately sized and well balanced. Although in general all QSPkR models are local models, valid for only these compounds belonging to the chemical and biological space of the model, the term “local models” is usually applied to models built on homogeneous, congeneric series, while models built on larger sets of chemically diverse compounds are referred to as “global models”. To construct a robust global model for any PK parameter, the number of drugs and their diversity should be as high as possible. It is widely accepted that between five and ten compounds are required for every descriptor in the QSPkR model [13, 59, 60].

For QSAR studies it is recommended that the total range of response data should be at least five times higher than the experimental error. The experimental error among in vivo data can often exceed half a log unit [53]; as a rule of thumb, Gedeck et al. recommended a response data range of at least one log unit to obtain a good QSAR model [61]. As the experimental errors of PK parameters are rarely reported, for a well-balanced set it is required that the values of the response data span over few orders of magnitude. Furthermore, the gaps between two consecutive values of response data should not exceed 10-15% of the entire range [52]. There are PK parameters which display a narrow range (e. g. free fraction of drug in plasma fu, or % PPB), while others cover many orders of magnitude (Vss, CL, half life t1/2). If the distribution of the response data differs considerably from the normal distribution, a logarithmic transformation is recommended [58]. If the dataset comprises groups of molecules with very low and very high values, classification approaches are more suitable. In this case the different classes or categories should contain almost equal number of molecules [52].

12

Preparation of the QSPkR table

Computation of descriptors

The second step in QSPkR modeling starts with representation of molecular structure with a number of numerical values – descriptors. Both experimental and theoretical descriptors can be used. Experimental descriptors reflect various physico-chemical properties of the molecules: log P (partition coefficient between n-octanol and water), logD (distribution coefficient at various pH values), molar refractivity, dipole moment, polarizability, dissociation constant, water solubility, etc. Theoretical descriptors are the final result of logic and mathematical procedures, which transform chemical information encoded within the symbolic representation of the molecule into a useful number [54]. Theoretical descriptors are advantageous as they do not contain statistical errors due to experimental noise.

Molecular descriptors, together with the response variable, form the QSPkR table: a matrix with n rows (compounds) and p columns (descriptors). A huge variety of descriptors have been proposed in the last 50 years, and their number constantly keeps on growing.

There are several classes of theoretical descriptors depending on the type of molecular representation and the algorithm for their calculation, which provide different amount of structural information. The most frequently used descriptors in QSAR are defined and discussed in several articles and books [62 – 68]. A comprehensive collection of molecular descriptors, along with their definitions, mathematical formulas, examples, and references, was reported by Todeschini and Consonni [69].

Most frequently, QSPkR modeling of PK parameters is based on 2D QSAR and utilizes 2D descriptors. The main advantages of 2D descriptors are that they contain simple and useful information about the molecular structure, they are independent of special orientation, and can be easily calculated without structure optimization [54]. 2D descriptors could be further grouped as follows:

- Constitutional descriptors. They reflect constitutional properties of the molecule, such as molecular weight, number of atoms, groups or bonds of given type, ring systems or circles, etc. They do not deal with atom’s connectivity or geometry, and can be derived fast and easy directly from molecular formulas.

13

- Topological descriptors. They are computed by algorithms applied to the topological representation (in the form of 2D graph, with atoms as vertices and covalent bonds as edges). These descriptors encode information about the arrangement of molecules, internal bonds orientation, molecular size, shape, branching, presence of hetero-atoms, substituents with different electronegativity, etc. There are several indices used to signify molecular connectivity: Wiener index [70], Randic index [71, 72], Balaban index [73], Shultz index [74], Kier and Hall indices [75, 76], Galvez topological charge indices [77], etc.

- Electrostatical descriptors. They contain information about the distribution of the electron density in the molecule and include descriptors defining atomic net and partial charges [78], highest negative and positive charge, molecular polarizability [79], etc.

- Quantum chemical descriptors. They are calculated using quantum chemistry methods and reflect electronic and geometrical properties of molecules. The mostly used quantum chemical descriptors in QSAR studies are atomic charges, molecular orbital energies, frontier orbital densities, absolute hardness [80], total energy, HOMO and LUMO [81], etc.

Structure based in silico approaches and 3D QSAR require 3D and high

dimensionality order descriptors. 3D descriptors are calculated from the 3D structure, 4D descriptors are derived from the interaction between the molecule imbedded into a grid and some probe, and so on. Undoubtedly, 3D- and 4D-descriptors have higher information content than the simpler order descriptors. There is a point of view that it is better to use most informative descriptors in all modeling processes. This thinking is not correct because the “best descriptors” are those whose information content is comparable with the information content of the response for which the model is sought. In effect, too much information in the independent variables with respect to the response is often seen as a noise in the model, thus giving instable or poor predictive models [54].

Molecular descriptors should satisfy some basic requirements as defined by Randic [82]. The most important are: structural interpretation, good correlation with at least one property, independence, simplicity, gradual change with gradual change of the structure, preferably – possibility for discrimination between isomers, etc. The use of descriptors with clear physical meaning is a prerequisite for QSPkR model interpretation and for understanding the main structural features governing the PK parameter of

14

interest. Ultimately, descriptors must be easily understood by the chemist who should translate the implicit information into molecular structure with desired PK properties.

Variable selection

Although thousands of molecular descriptors can be generated using various software, practically only few of them are essential for the modeled property. The use of excessive number of descriptors may affect negatively various aspects of QSPkR analysis. Many descriptors may not encode information, relevant for the response. Or, the descriptor set can contain inter-correlating descriptors. If two descriptors in QSPkR are highly collinear they contribute twice to the response which confounds statistical significance and may hamper the interpretability of the model. Correlation method can be used as a preliminary filter for discarding inter-correlating descriptors and avoid redundant information. If the correlation coefficient between two descriptors exceeds the threshold of 0.8, the less significant descriptor of the pair should be removed [83]. Some statistical methods like multiple linear regression (MLR) require that the number of descriptors is significantly lower than the number of molecules in the dataset, i. e. using large descriptor sets would require larger datasets, rarely available in QSPkR modeling. Even if the method is formally not sensitive to a large descriptor set, a risk of poor and misleading models exists due to chance correlations and over-fitting. Over-fitting refers to phenomenon in which a predictive model may well describe the relationship between predictors and response, but fails to provide valid prediction for new compounds [84].

Obviously, to yield a reasonable model, the number of descriptors in the QSPkR model should be as low as possible, and selection of relevant descriptors from a large descriptor pool is important step in QSPkR modeling. It is recommended that the initial set of descriptors is firstly screened and descriptors with too many zero values, or such with no significant variation, to be removed prior next selection steps.

A wide range of methods for automated narrowing of the set of descriptors has been developed. The most frequently used are stepwise linear regression (SLR), principal component analysis (PCA) and genetic algorithms (GAs). SLR performs a simple stepwise procedure based on the MLR [58, 85]. It consists of a number of multiple regressions, each time removing the most insignificant descriptor and retaining the most relevant ones. In the “forward stepwise regression” the most relevant variable in terms of explained variance is first included in the linear regression. The addition of next variables continues until they significantly improve the existing model. Once a new

model is obtained, any variable in the model is tested to see if the fit can be improved by

15

removing it. If the model can be improved, the variable is deleted and the stepwise procedure is repeated until no further variables are either included or deleted. In the “backward stepwise” version, the initial model is the one with all variables, and then variables are selected and deleted one by one. The improvement of the models is evaluated using a suitable stepping criteria, usually Fisher test, defining ”F-for-inclusion” and “F-for-exclusion”. SWR is very popular in QSAR studies and is implemented in almost all statistical software packages. One of the drawbacks of the method is that no optimal variable selection is guaranteed, since the new variables are included (or deleted) based on the previously included variables into the model [58].

PCA is a simple statistical procedure to reduce the dimensionality of a data set consisting of a large number of interrelated variables, while retaining as much as possible the variation in the data set [86]. The whole set of explanatory variables is substituted by few new, uncorrelated latent variables, defined as principal components (PCs). These PCs are sorted by decreasing information content (i. e. decreasing variance) so that most of the information is presented in the first few PCs [58].

GA is considered as the optimal strategy for variable selection [87, 88]. GAs are a class of heuristic search algorithms for function minimization. Regarding descriptor selection, it optimizes the prediction error of the model built upon a set of descriptors. GA is inspired by the mechanism of evolution of species according to the Darwinian theory of “survival and fitness” [89]. It mimics the natural evolution by modeling a dynamic population of individual solutions (models). Each individual in a population is represented by a chromosome. As well as the chromosome represents a random combination of genes, the individual solution (model) represents a random combination of descriptors. Each chromosome leads to a solution (a model built using random combination of descriptors). Starting from a group of random initial models (first generation), GA searches for chromosomes (solutions) with better fitness in future populations applying operators for parent selection, cross over and mutation. By allowing survival and reproduction of the fittest chromosomes, GA minimizes the error function in subsequent generations [65, 90].

Partitioning of the dataset into training and test sets

This step should be performed before QSPkR modeling although it is an element of QSPkR model validation. The appropriate selection of the training and test sets is a crucial problem in external validation, essential for QSPkR model predictivity. Ideally, the test set should be truly independent and drawn from a different data sources, or even

16

by synthetizing and testing new molecules [91]. However, finding new compounds with experimental response data is difficult and, alternatively, partitioning of the entire dataset is performed.

Development of rational approaches for the selection of training and test set compounds is an area of active research. The leading principle is that both the training and test sets span the whole descriptor space occupied by the entire dataset [92]. In other words, each molecular in the test set should be similar to at least one molecule in the training set, and vice versa. Moreover, the training set must have significant diversity [93]. A good number of algorithms have been proposed [94]. They could be classified as follows:

- Random selection – best in the case of a large dataset [91]; - Based on the response (Y-variable). The molecules in the dataset are ranked

according their Y-values and are divided into bins. The molecules in each bin are then assigned to the training or test set – randomly, or in customized way (for example – each fifth compound – to the training set, the other four – to the training set);

- Based on the descriptor space (X-variable). These approaches relay on structural similarity search and involve k-means clustering, Kohonen’s self organization maps, statistical molecular design, Kennard-Stone selection, etc.

No one approach of data splitting could be assigned as ideal one. Leonard and Roy compared three division methods and demonstrated that k-means clustering was superior to random selection and sorted biological activity [95]. They concluded that the selection of training and test sets should be based on the proximity of the representative points of the test set to representative points in the training set in the multidimensional descriptor space. In contrast, comparison between the results from rational division approaches (Kennard-Stone selection, minimum test set dissimilarity and Sphere Exclusion) and random splitting suggested comparable predictive power of the models built, i. e. using rational design methods to split data into training and external test set did not yield more predictive QSAR models [96].

In fact, a great similarity between the training and test set could give an overly optimistic indication of QSPkR predictive ability. In the course of partitioning of the dataset special care must be taken not to replicate the compounds in the training and test sets which can falsely “improve” the apparent predictive power of the QSPkR solution.

17

QSPkR model development

The next step of QSPkR modeling is the generation of a statistical significant relationship between the PK parameter of interest (dependent variable, Y) and the number of selected molecular descriptors (independent variables, Xi). A huge number of statistical tools are developed, many of which available in a variety of software packages. The methods used in QSPkR generation depend essentially on the type of the modeled response data. Regression methods are employed in the case of numerical (quantitative) response data, while classification techniques are used for modeling qualitative and semi-qualitative data. From another point of view, the modeling techniques could be divided into linear methods and non-linear methods. A detailed discussion on the statistical methods employed in QSAR/QSPkR is beyond the scope of the present review, it can be found in several specialized papers and books [58, 64, 65, 67, 97 – 101]. A few of the most commonly statistical tools for QSPkR model development are presented briefly.

Multiple linear regression (MLR)

MLR is the earliest and the most widely used technique for regression QSPkR modeling. It derives a mathematical function that best describes the response variable (Y) as a linear combination of molecular descriptors (Xi):

nn22110 Xb...XbXbbY ++++=

The parameters in the equation (the model constant b0 and coefficients bi) are chosen to minimize the squares of errors between the predicted and experimental value of the modeled variable. The sign of the coefficient is indicative for the effect of the respective descriptor on the response: positive sign suggests positive contribution, and negative sign – a negative effect. MLR is favored for its simplicity, transparency, reproducibility and easy interpretability. The main shortcomings of MLR are the inadmissibility of inter-correlated descriptors which deteriorate the quality of the model, and the requirement for large compounds-to-descriptor ratio. It has been recommended that the number of compounds in the training set must be at least 5 times higher than the number of descriptors used [102]. Some new methodologies based on MLR have been developed recently aiming to improve this technique – Best MLR, Heuristic method, GA based MLR, etc. [96]. MLR has been frequently used for QSPkR prediction of Vss [15, 16, 21, 22], CL [28, 30, 31] and PPB [34 – 38, 45 – 49].

18

Partial least squared regression (PLS)

PLS generalizes and combines the principles of MLR and PCA [103]. It allows handling of a large number of inter-correlating descriptors without any requirements regarding compound-to-descriptor ratio thus overcoming the problems of MLR. The basic idea (taken from PCA) is that a large number of descriptors can be transformed into a small number of orthogonal terms called latent variables (LVs). PLS derives a linear model describing the dependent variable Y by a set of relatively small number LVs:

mm2211 LVb...LVaLVaY +++=

where ai are regression coefficients of the LVs. Each LV in turn is a linear combination of the original variables (molecular descriptors Xi):

mn2211i Xb...XbXbLV +++=

Theoretically, hundreds of LVs can be extracted, but generally only few are involved in QSPkR models. The first LV accounts for the most of the variance while the consecutive are less significant. PLS is also interpreted as “Projection to Latent Structure”. Recently PLS has evolved by combination with other statistical techniques for better performance. A few upgraded PLS approaches – Genetic PLS, Factor analysis - PLS and Orthogonal signal correction (OSC), have been developed [98]. PLS has been successfully used for in

silico prediction of drug Vss [14, 17, 19] and CL [24, 25, 27, 29, 32]. Artificial neural network (ANN)

ANNs are mathematical models and algorithms inspired by the structure, functionality and learning processes in the brain. They can be used for solving various problems in QSAR/QSPkR: association, classification (clustering), transformation (different representation) and modeling [58, 90]. ANN represents a network of inter-processing elements (neurons) linked with weighted connections. Neurons collect series of input signals and transform them into output signal by means of transfer function. The classical ANN consists of three or more layers: an input layer, hidden layer(s) and output layer. The input layer encompasses molecular descriptors and the output layer – the modeled response of the compounds in the training set. In the learning process, the weights of the connections gradually change until the true input-output relationship for the training data is approximated. The main

19

characteristics of ANNs are architecture, transfer function and learning (training) paradigm. There are two main groups of ANNs differing in the architecture and learning strategy: unsupervised or supervised self-organizing maps (Kohonen’s maps) and supervised back propagation [90, 104]. In unsupervised learning only input variables (molecular descriptors) are involved in the training process, while supervised learning employs information for both molecular descriptors and output (response) data. A variety of ANN methods such as feed forward neural networks, back-propagation networks, Kohonen self organizing networks, etc. have been developed [58]. The earliest and simplest ANN is the feed-forward back-propagation network [6, 65, 90, 100]. The network is feed-forward because the elements in one layer are connected only to the elements in the next layer and the information moves forward from one layer to another toward the output layer. The transfer function (commonly linear or sigmoidal) transforms the input before forwarding the information to the next hidden layer and/or to the output layer. In the course of training, the network of neurons learns by changing the weight of connections (initially randomly assigned). One of the most popular learning methods is back-propagation of errors. In back-propagation the output from the neural network is compared with the experimental response data and an error function is defined. This calculated value is then feedback into the network and the weights of the connections are changed in such way that the error is minimized. This iterative procedure is performed until the errors converge leading to improved approximation of the input-output relationship. After the training phase the weights of the connections are fixed and the system is ready for testing phase. The architecture of feed-forward back-propagation network is given by: 1. The number of inputs and outputs; 2. The number of layers; 3. The number of neurons in each layer; 4. The number of weights in each neuron; 5. The way in which the weights are linked together, within or between the layers; 6. Which neuron receive the correction signal. Recently, radial basis function (RBFNN) and general regression neural networks (GRNN) gain more popularity [98]. ANN is superior to the conventional regression methods for its ability to approximate any target function and to model complex non-linear processes like most of ADME processes. It is not sensitive to inter-correlated descriptors and is able to handle a large number of descriptors exceeding the number of molecules. Its main shortcomings are the risk of over-training, over-fitting, and insufficient interpretability which hinders the structure optimization of the molecules [5, 100]. In addition, a careful optimization of the architecture of ANN is required, which is a subjective and time-consuming

20

process [64]. ANN was successfully applied for prediction of drug binding to PPB [39, 45]. Decision tree (DT)

DTs are a special class of machine learning methods which algorithm differs from most of the statistical and regression algorithms by their similarity to logic based and expert systems [65]. The models are obtained by recursively partitioning of the data space and fitting a simple prediction model within each partition. Historically, DTs were developed as classification approach (classification trees), and subsequently the methodology has been extended for solving regression problems (regression trees). Some widely available algorithms are critically discussed [105]. DT methods create a tree-like structure consisting of nodes and links [58, 65]. Nodes are linked hierarchically with several child nodes branching from a common parent node. There are three types of nodes: a root node, internal nodes and leaf nodes. The leaf node corresponds to the target property (class) – for example high plasma protein (PP) binders, or low PP binders. The root and internal nodes correspond to the molecule descriptors. In each node a test using a single descriptor is made. The tests are based on a rule, a simple “if – then” logical construction. For example, the rule can be: if logP > 3, then the drug is a high PP binder. The classification of an unknown compound is based on the leaf node that is reached after consecutive applications of rules regarding different molecular descriptors. Thus, the whole decision process is based on a series of questions (nodes) and answers (guiding the branches) starting from the root node and ending to the life node. The training process starts with defining a rule for the root node. A threshold for any descriptor that optimally separates the compounds into well-defined classes is chosen. Usually a single test is not sufficient for complete classification of the compounds in the training set, therefore an iterative procedure is performed by adding new rules. One of the most popular DT algorithms is classification and regression decision tree (CART) [106]. It consists of two steps: tree construction and tree pruning. In the tree construction process, the algorithm identifies the best descriptors able to separate the compounds in the parent node into two child nodes. The split maximizes the homogeneity of the sample population in each child node (e.g. one node is dominated by high PP binders, and the other – by low PP binders). Consequently, the child nodes become parent nodes for further splits, and splitting continues until compounds in each

21

node are either in one classification category, or cannot be split further to improve the quality of the DT model. In the second step “pruning” is used to reduce the size of DT by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by minimizing the risk of over-fitting. CART has been successfully used for prediction of Vss [14] and CLR [22] of drugs.

Random forest (RF)

RF is a further improvement of DTs which combines the results of multiple DT models, which in consensus produce better prediction. The principle of DF is to develop multiple DT models that are heterogeneous, but with comparable quality rather than combining several identical DT models, without effect on predictivity. Comparable quality ensures comparable weight of each DT model in the combining prediction [58]. The algorithm of DF consists of two steps: development of an individual DT model using bootstrap samples from the training set and randomly selected descriptors, followed by development of a next DT model using only descriptors not involved in the previous DT model(s). These steps are repeated until no additional DT models can be constructed. In this process the rate of false classification for each DT model is controlled at a fixed level (3–5%) to ensure the comparable quality of individual DT models. The number of trees in the forest and the number of descriptors in each tree should be well optimized. A large number of trees and a number of descriptors equal to the square route of the total number of descriptors are recommended [100]. RF has been successfully used for prediction of drug CL [27, 32].

k-Nearest Neighbor (kNN)

kNN is a simple pattern recognition approach, considered as one of the top 10 techniques for data mining, widely used for classification and regression problems [107]. Essentially, it doesn’t require training and is asymptotically optimal, i. e. with increase of the amount of training data converges to optimal prediction error [65]. In the classification stage, kNN searches for molecules in the training set that lie closest to the molecule for which the prediction is being made. The distance can be assessed using various structural similarity measures in the multidimensional descriptor space, such as Tanimoto coefficient, Euclidian distance, Mannhattan distance, etc. [108]. The class of an unknown molecule is assigned by a voting procedure among k nearest neighbors. The most highly represented class among these nearest neighbors determines the classification. The performance of the model depends on the number of neighbors k,

22

which must be previously defined. Too large or too small values of k may result in misclassification. A value of k = 3 is considered as sufficient to produce good results, and it is frequently used in kNN based research [100]. Also, since kNN relies on the measuring distances between molecules in the multidimensional descriptor space, descriptor values should be scaled to avoid greater influence of descriptors with large values. Several kNN based models for prediction of human CL [24 – 26] and PPB [45] have been reported.

Support vector machine (SVM)

SVM refers to a group of supervised machine – learning methods that can be applied to both classification and regression problems. They are based on the structural risk minimization principle from statistical learning theory [109] which is superior to the traditional empirical risk minimization principle employed by conventional neural networks [58]. The theory and practice of SVM is discussed in several books [110 – 112]. In its simplest form SVM is a linear classifier. Molecules from different classes can be separated by a linear hyper-plane. It is defined solely by its nearest compounds from the training set (called support vectors) in such way that the margin of separation between molecules of different classes is maximized. New points, defining the test compounds, are then mapped into that same space and predicted to belong to any class depending on which side of the hyper-plane they will fall on [58]. The linear SV-classifier can be transformed to non-linear classifier by using a kernel function. The kernel function maps the original descriptor space to a high or infinity dimensional space. Thus, by training a linear classifier in the kernel space, a non-linear classifier with respect to the descriptor space is obtained. Two kernel functions are typically used: polynomial kernel and radial basis kernel [65]. SVM offers better performance than traditional statistical approaches. The main advantage of SVM is the effective avoidance of over-fitting which improves its ability to build models using large number of molecular descriptors with relative few experimental data [58]. Among the drawbacks of the method are the high algorithm complexity and low interpretability of the models. The SVM methods have been further extended to Support Vector Regression (SVR) [113]. SVM and SVR have been successfully applied for prediction of drug Vss [18], CL [25, 28], and PPB [45].

In summary, numerous statistical and machine learning methods have been developed for QSAR/QSPkR modeling, and they are continuously upgrading and

23

multiplying along with the progress in the high throughput technologies. No one method can be considered as the ideal for all purposes – each one has its advantages and shortcoming, assumptions and restrictions. There is an agreement that the best model is the simplest one which ensures satisfactory accuracy of predictions. QSPkR modeling is a rather complicated task due to the complex non-linear nature of the underlying PK processes and the insufficiency of high quality experimental response data. Therefore, conventional regression methods often fail to provide robust QSPkR models, while the non-linear statistical techniques and machine learning methods become increasingly popular. According Yamashita, the main difficulty in modeling ADME does not appear to be the application of modeling techniques, but to do it successfully with the limited available data [2].

Cronin defined a few criteria which the ideal QSAR should meet [13], and the same could be applied to QSPkRs. The model must be simple, transparent, interpretable and mechanistically relevant. A simple model will have small number of descriptors – a precondition for good predictivity and interpretability. The transparency depends on the modeling technique. Thus, MLR is thought to be highly transparent with well understandable algorithm and easy prediction. In contrast, the more complex multivariate and non-linear modeling techniques (ANN, SVM, etc.) are considered as low-transparent. The mechanistic relevance of the model is more difficult to define as PK parameters are generally a result of multiple mechanisms. In this case, the mechanistic relevance is closely related to the interpretability of the model and the capability to define the main structural features which are relevant to the mechanisms involved. However, giving that QSPkR modeling aims to provide tools for prediction of PK properties of new synthetized drug candidates, the leading requirement for working QSPkR model should be its transferability. So far we have not found in the available literature any example for prediction of PK parameters of new drug candidates using any of the published QSPkR models. The major problem for model transferability is supposed to be the reproducibility of the descriptor values [114].

QSPkR model validation

The building of a significant QSPkR model for the training set is not a guarantee for good predictivity of the model on compounds not involved in the training set. There is always a risk of chance correlations and over-fitting, especially when the number of descriptors is too large compared to the number of training molecules. Therefore,

24

validation is required to evaluate the ability of the model to generalize to new compounds.

In the narrow sense of the word, validation encompasses a number of procedures for judging the predictive power of the model on unforeseen compounds. Tropsha et al. set three requirements for reliable and predictive QSAR model: statistical significance and robustness, predictive ability on new compounds, and defined applicability domain [92]. The content of the term validation was further extended with the acceptance of five criteria for good QSAR models, known as OECD principles [50, 51].

OECD principles

The principles were first proposed for the validation of QSAR models for regulatory purposes in a meeting of QSAR experts in Setubal, Portugal, in 2002. They were further approved by OECD countries, QSAR and regulatory communities at the 37th Joined Meeting of the Chemical Committee and Working Party on Chemicals, Pesticides and Biotechnology in 2004. The five principles denoting the validity of QSAR model are:

1. A defined end point. The aim is to ensure clarity in the end point variable being predicted by the model. In general, end point data could be determined by different experimental protocols and under different conditions. Laboratory-to-laboratory variance may lead to incorrect model. Therefore, it is important that the specific end point associated with the QSAR should be well described so that the user can judge its appropriateness.

2. An unambiguous algorithm. The aim is to ensure transparency of the model development algorithm. QSAR models should have the potential to be further used (by other users) for prediction of the response of new compounds. The lack of detailed information of the modeling algorithm hinders the reproducibility of the model. Therefore the algorithm used in model development should be described in detail including complete information about the structure representation, calculation of descriptors, software employed, variable selection, statistical metrics used, validation procedures, etc.

3. A defined domain of applicability. The need to define an applicability domain expresses the fact that QSARs are reductionist models which are inevitably associated with limitations in terms of the types of chemical structures, physicochemical properties and mechanisms of action for which the models can generate reliable predictions. Further work is recommended to define what types

25

of information are needed to define QSAR applicability domains, and to develop appropriate methods for obtaining this information.

4. Appropriate measures of the goodness of fit, robustness and predictivity. The aim is to simplify the overall set of criteria, but not to lose the distinction between the internal performance of a model (as represented by goodness of fit and robustness) and the predictivity of a model (as determined by external validation). Further work was recommended on development of approaches that could be used to provide appropriate measures of internal performance and predictivity, and to determine what constitutes external validation of QSAR models.

5. A mechanistic interpretation, if possible. The intent is to ensure that some considerations are given to the possibility of mechanistic association between the descriptors in the model and the end point being predicted. It is not always possible from scientific viewpoint. The absence of mechanistic interpretation does not mean that a model is not potentially useful in the regulatory context.

Methods for QSAR/QSPkR model validation

Numerous validation techniques have been developed in years which could be divided into internal and external validation. Their principles, advantages and shortcomings are widely discussed [12, 52, 84, 115]

Internal validation of the model is performed on the molecules used in model development, i. e. on the training set. The commonly used approaches are cross validation, bootstrapping and randomization test.

In the leave-one-our cross validation (LOO-CV) one molecule is omitted from the training set, the model is built on the other (n – 1) molecules and then is used for prediction of the end point variable of the omitted molecule. This procedure is performed for all molecules in the training set. The predictive ability is assessed by the LOO-CV correlation coefficient q2

LOO-CV. Leave-many-out cross validation (LMO-CV) is a more rigorous procedure in which the original training set is divided into n subsets. Each subset is assigned once as a validation test set for a model, built on the remaining (n-1) subsets, composing the training set. The predictive ability is assessed by the LMO-CV correlation coefficient q2

LMO-CV. Although the individual models are tested on compounds not involved in the model development (in fact, external validation), LMO-CV is considered as internal validation procedure as it influences the choice of the best model.

26

Bootstrapping a resembling method in which several (k) subsets of equal size (n) are selected randomly from the training set [116]. Some of the molecules may be repeatedly presented in one subset, while others may never be selected. Each subset is used once to construct a model, which is used for prediction of the molecules in the other subsets. This process is repeated several times. As the bootstrap sample is taken from the original dataset using sampling with replacement, it is not identical with the original dataset. The predictive ability is assessed by the average q2

Boot which closeness to 1 is a suggestion of model robustness.

Scrambling (randomization) can ensure that the model is not developed by chance. It is based on scrambling in the training set, i. e. reassigning the values of the response variable to different molecules, and repeating the modeling procedure many times. The values of the average r2

rand and q2rand of these chance correlations should be

significantly lower than the explained variance for the original model.

External validation means evaluation of the predictive ability of the model on a sufficiently large external independent test set of compounds that have not been used at any step of model development. It is considered as the “gold standard” and the only way for judging the true predictive power of the model [52, 84, 91, 94]. The ideal case is using new molecules drawn for different literature sources. Due to insufficiency of data, generally the available dataset is split into a training set, used for the model generation, and an external test set for external validation, as already described.

Golbraikh and Tropsha (2002) demonstrated the insufficiency of the internal validation for development of externally predictive QSAR models [93]. A more rigorous algorithm for model validation was recently proposed [52]. The training set should be split randomly into modeling set and external evaluation set (the latter – about 15 – 20% of the original dataset). In turn, the modeling set can be divided to a number of training and test sets for LMO-CV. The terminology regarding the different test sets is slightly confusing and has not been widely accepted. Alexander et al. proposed a more-clear terminology [91]. The training set encompasses all molecules used in model generation. A large training set with high degree of molecular diversity that spans a large range of the modeled property is required. Validation sets are subsets from the training set involved in LMO-CV. As the compounds in validation sets are not used to fit the model, they give an independent measure for model predictivity. However, as they affect the choice of the best model, they are not considered as completely

27

independent. The external test set consists of molecules which are neither used to fit the models, nor to select between them.

Statistical metrics used for assessment of QSAR/QSPkR model quality

Numerous statistical metrics are used for evaluating the quality of the model. There are two aspects to be considered: the goodness of fit (emphasizing on the fitting properties of the model) and the goodness of prediction (considering the predictive ability of the model). The most frequently used measures of the goodness of fit for regressions methods are:

- Coefficient of determination, or explained variance (r2). It is accepted as a standard measure of model quality (i. e. how well the model fits the particular data set) and is usually defined as the square of the correlation coefficient between observed and calculated values in the regression, as follows:

( )

( )∑

∑

=

=

−

−

−= n

1i

2

obsobs,i

n

1i

2calc,iobs,i

2

YY

YY1r

where Yi,obs and Yi,calc are the observed and calculated by the model response for the ith compound in the training set, and obsY is the mean value for the observed response data.

The model is as good, as the value of r2 is closer to 1. - Adjusted determination coefficient (r2

adj). It reflects better the explained variance in the case of a large number of descriptors:

( )p1n

pR.1nr

22adj

−−

−−=

where n is the number of molecules in the train set and p – the number of descriptors in the model.

- Variance ratio, or Fisher statistics (F). It judges the overall significance of the regression equation and represents the ratio between explained and unexplained variance:

( )

( ) ( )1pn/YY

p/YYF

n

1i

2calc,iobs,i

n

1i

2

obscalc,i

−−

−

−

=

∑

∑

=

=

28

A high value of F is indication for high significance of the model.

- Standard error of the estimate (SEE):

( )

1pn

YYSEE

n

1i

2calc,iobs,i

−−

−

=∑

=

For a good model it should be as low as possible.

Several metrics are proposed for evaluation of the predictive ability depending on the validation method applied.

- Cross validated correlation coefficient for the training test (q2):

( )

( )∑

∑

=

=

−

−

−= n

1i

2

trainobsobs,i

n

1i

2

trainpred,iobs,i2

YY

YY1q

For a good predictive model, q2 should be close to 1. A negative value of q2 is indicative for very poor predictive power.

- rm metrics, proposed by Roy et al. [117]

( )( )20

222m rr1rr −−=

( )( )20

222m 'rr1r'r −−=

2

'rrr

2m

2m2

m

+=

v

2m

2m

2m 'rrr −=∆

where r2 and r02 are the squared correlation coefficients between the observed and

predicted values of the compounds with and without intercept, respectively. The parameters r’2 and r0’

2 have the same meaning, but with reversed axes. 2mrv is the average

value of rm2 and rm’2, and ∆rm

2 is the absolute difference between rm2 and rm’2.

- Predictive coefficient for the external test set (r2pred). It reflects the degree of

correlation between the observed and predicted by the model response data for the test set compounds, and is given by the formula:

29

( )

( )∑

∑

=

=

−

−

−= n

1i

2

testobsobs,i

n

1i

2

testpred,iobs,i2pred

YY

YY1r

Some authors [12, 93] recommended using train,obsY instead of test,obsY . However, using

test,obsY is not only more consistent and simple, but also minimizes the value of r2pred, so it

is more conservative [91].

- Root mean square error of prediction (RMSEP):

( )

n

YYRMSEP

n

1i

2

testpred,iobst,i∑=

−

=

- Coefficients of determination (predicted vs. observed and observed vs. predicted response for regression through the origin) r0

2 and r0’2 [92, 93] .

- k and k’ – the slopes of the respective regression lines through the origin.

Criteria for good predictive models

Many authors consider an internal validation with q2 > 0.5 as an indicator, and sometimes even as the ultimate proof of the high predictive power of the model. Several studies have demonstrated that a high q2 value is a necessary, but not sufficient condition for good predictivity [93]. It seems that the only way to evaluate the true predictive power of a QSAR model is external validation with a sufficiently large independent test set of molecules, not involved in any step of model development. Tropsha et al. [52, 92] set the following criteria for good model predictive ability:

1. Cross validated correlation coefficient for the training set q2 > 0.5; 2. Predictictive coefficient for the external test set r2

pred > 0.6;

3. ( )1.0

rrr

2

20

2

<− and 0.85≤k≤1.15, or ( )

1.0r

'rrr 2

20

2

<− and 0.85≤k’≤1.15;

4. 20

20 'rr − < 0.3.

where r02 and r0’

2 are the predictive coefficients (predicted vs. observed and observed vs. predicted response for regression through the origin, while k and k’ are the slopes of the respective regression lines through the origin.

30

Similar set of criteria have been proposed by Roy [12]. 1. Internal validation q2 > 0.5; 2. ∆rm

2 (LOO) < 0.2 and rm2(LOO) > 0.5;

3. External validation r2pred > 0.5

4. ∆rm2 (test) < 0.2 and rm

2(test) > 0.5.

The above criteria do not set acceptable measures for the goodness of fit. Veerasamy proposed a threshold value for the coefficient of determination r2 ≥ 0.6. In addition, the difference between r2 and q2 should not exceed 0.3 – higher difference is an indication of over-fitting of the data [84].

Recently Alexander et al. published a critical analysis of the widely adopted criteria of Golbraikh and Tropsha [89]. They discussed the possible confusions and misunderstandings surrounding the use of the coefficient of determination r2 as a measure of model fit and prediction power as it has been used in different context, calculated by different formulas and thus having different physical meaning. They demonstrated that r2 for regression through the origin of observed on predicted values (or vice versa) is a very weak criterion leading to over estimation of model predictivity, and proposed simple and clear criteria for good predictive model:

- High predictive coefficient rpred > 0.6, calculated for the external test set according to the equation given above, and not from the regression of observed on predicted values;

- Low RMSEP. A threshold for this measure has not been defined; according to a useful rule of thumb the test set RMSEP should be less than 10% of the range of target property value. For example, if the range of target property value is 3.5 units, the value of RMSEP should be ≤ 0.35.

RMSEP was considered as the most meaningful measure for model predictivity as it gives the standard deviation of the residuals. An approximate 95% confidence interval for predicting new compounds data is Ypred±2RMSE, if the model is correct and errors are normally distributed [91]. In addition, the authors emphasized on the utility of the plot of the observed vs. predicted response data. The plot for the training set shows how well the model predicts the data used to generate it, the plot for the test set demonstrates how well the model predicts new data. The closer the data in such plot lie to the line Yobs = Ypred, the better is the model. Including this line on the graph is helpful for model

31

assessment. In addition, the graph provides information for the outliers and trends in the data [91].

So far, there are no standard criteria for evaluation of predictivity of QSPkR models. Taking into account that QSPkR is an extension of QSAR for ADME prediction, it is reasonable to use the accepted criteria for QSAR models. A few additional measures for predictivity of QSPkR models are frequently used which account for the error of prediction of the studied PK parameter rather than the error of its logarithm, which is generally used as Y-variable in the QSPkR models: - Mean fold error of prediction MFEP:

n

10MFEP

N

1i

YY pred,iobs,i∑=

−

=

- Geometric mean fold error of prediction:

n

YYn

1ipred,iobs,i

10GMFEP

∑

==

−

where Yi,obs and Yi,pred are the logarithms of the observed and the predicted by the model value of the studied PK parameter for the ith drug in the test set.

Accuracy of prediction is given as the percentage of molecules in the test set which PK parameter is predicted within the two-fold or three-fold error of the experimental value. Models with GMFEP ≤ 2 and accuracy about 60% are usually considered as good predictive models [27].

QSPkR analysis: interpretation, applicability domain and outliers.

The good quality QSPkR model should be predictive and interpretable. It should involve descriptors with clear physical meaning which reveal the main structural features governing the modeled PK parameter. Although QSPkR models are empirical models and mechanistic interpretation is difficult, the chemical information encoded in the descriptors should give some light on the mechanisms of the underlying processes. The link between mechanisms involved and factors affecting it can serve as a guideline in designing molecules with optimized PK properties.

32

In general, QSPkR models for the key PK parameters are developed on limited datasets (training sets) and are tested on smaller test sets of similar compounds. Even the perfect QSPkR model in terms of statistics cannot be expected to make reliable predictions for the entire universe of drugs. Instead, every QSPkR model has its applicability domain (AD). AD is a theoretical region defined by the model descriptors and modeled response in which the model makes predictions with given reliability. In fact, this is the response and chemical space of the molecules in the training set [118]. The absence of defined AD can lead to unjustified extrapolation of the model in the chemical space and, as a result, a high probability of inaccurate predictions. The AD of the model should be described in terms of the most relevant parameters, i. e. the descriptors in the model. Ideally the QSAR should only be used to make predictions within that domain by interpolation, not by extrapolation [12]. Whereas an excellent fit to the training set may be reliably attainable, models often fail to predict accurately chemicals that are out of AD. Various methods are proposed for identifying the AD of QSAR models [12, 118 – 121]. Most of them aim to define the interpolation region in the descriptor space including range distance based, geometrical, probability density distribution based techniques, etc. Others consider the response space of the training set molecules. AD is a subject of extensive research, and many new methodologies have been designed to estimate model adequacy and practical application [122 – 124].

In most cases, the training set contains few compounds that do not fit to the developed QSPkR model, defined as outliers. From mathematical point of view, the outlier is a data point that falls outside the confidence interval of the regression line, i. e. it has higher standardized residual than the rest samples in the data set. Two types of outliers could be distinguished: structural outliers, and response outliers. Structural outliers represent singletons in the dataset and can be identified using simple classification techniques like PCA, cluster analysis, etc. Response outliers are essentially defined as “activity cliffs”. Activity cliffs or, more generally, property cliffs, are pairs of compounds with high structural similarity but unexpectedly high activity (or property) difference [125]. With respect to PK parameters, these differences can be due to different modes, or strength of binding to plasma proteins, transporters, tissue components, etc., which may affect dramatically drug Vss, CL, PPB, etc. No QSPkR method will be able to predict such cliffs unless they have not been already observed in the training set. Both types of outliers can be real, or due to errors in structure representation or response value [52]. Several methods are proposed for identification of outliers [126 – 128]. Typically, a compound is considered as an outlier if its

33

standardized residual exceeds three standardized residual units [129]. Some authors proposed a relative error of 2% as a criterion. A standard plot of QSPkR analysis, i. e. the relationship between predicted and observed value provides a visual inspection of the correlation and can be used as a tool for exploring data and detecting outliers [58]. Frequently any criteria for outlier identification are implemented in the software packages. For example, MDL QSAR version 2.2 (MDL Information Systems, Inc.) considers as outliers molecules which residuals do not obey the normal distribution law. It is recommended to remove the outliers from the dataset before generation of the QSPkR model [130]. However, this procedure must be completed very carefully because the structural and response diversity is a prerequisite for successful QSPkR model. The outliers should be carefully analyzed and the reasons for their peculiarity should be provided. Frequently they offer valuable information about the mechanisms of the underlying processes and the AD of the model.

In summary

Considerable progress has been made in QSPkR methodology for prediction of ADME behavior. Numerous statistical and machine learning methods have been developed and successfully applied for modeling of the key PK parameters. However, the reported models have more theoretical than practical application. Most of them meet one of the goals of QSAR/QSPkR and reveal the main molecular features governing modeled parameter. However, they are hardly reproducible in other laboratories, and therefore cannot be used for ADME prediction of new drug candidates. The major problems for QSPkR models generation are the indufficiency of high-quality experimental data for ADME parameters, the incomplete knowledge on the underlying mechanisms, and the lack of standardized procedures and acceptance criteria for QSPkR modeling. The access to larger databases including not only approved drugs, but also rejected compounds, further mechanistic understanding, and standardized workflow are prerequisites for development of a global, predictable and transferable QSPkR model able to select lead compounds for optimization in early stages of drug discovery.

34

References

1. Van de Waterbeemd H, Giffold E. ADMET in silico modeling: towards prediction paradise? Nature 2003; 2:192-204.

2. Yamashita F, Hashida M. In silico approaches for predicting ADME properties of drugs, Drug Metab Pharmacokinet 2004; 19(5):327-338.

3. Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nature Rev/Drug Discov 2004; 3:711-715.

4. Boobis A, Gundert-Remy U, Kremers P, Macheras P, Pelkonen O. In silico prediction of ADME and pharmacokinetics. Report of an expert meeting organized by COST B15, Eur J Pharm Sci 2002; 17:183-193.

5. Butina D, Segall MD, Frankcombe K. Predicting ADME properties in silico: methods and models, DDT 2002; 7(11):S83-88.

6. Mager D. Quantitative structure pharmacokinetic/pharmacodynamics relationships, Adv Drug Deliv Rev 2006; 58:1326-1356.

7. Chohan KK, Paine SW, Waters, S. Advancements in predictive in silico models for ADME, Cur Chem Biol 2008; 2:215-228.

8. Madden JC. In silico approaches for predicting ADME properties. In: Recent advances in QSAR studies, Pyzyn T, Leszczynski J, Cronin MTD, Eds; Springer: Dordrecht, Heidelberg, London, New York, 2010, pp. 283-304.

9. Xu C, Mager DE. Quantitative structure-pharmacokinetic relationships, Exp Opin Drug Metab Toxicol 2011; 7(1):63-77.

10. Wang J, Hou T. Recent advances on in silico ADME modeling. In: Annual reports in computational chemistry; Wheeler RA, Spellmeyer D, Eds; Elsevier: Amsterdam, Boston, Heidelberg, etc., 2009, pp. 101-127.

11. Ekins S, Waller CL, Qwaan PW, Cruciani G, Wrighton SA, Wikel JH. Progress in predicting human ADME parameters in silico. J Pharmacol Toxicol Meth, 2000; 44(1):251-272.

12. Roy K; Kar S; Das RN. Understanding the basis of QSAR for application in pharmaceutical sciences and risk assessment. Elsevier: Amsterdam, Boston, Heidelberg, etc., 2015, pp. 1-476.

13. Cronin MTD. Quantitative structure-activity relationships (QSARs) – applications and methodology. In: Recent Advances in QSAR Studies; Pyzyn T, Leszczynski J, Cronin MTD, Eds; Springer: Dordrecht, Heidelberg, London, New York, 2010, pp. 3-11.

35

14. Gleeson MP, Waters NJ, Paine SW, Davis AM. In silico human and rat Vss structure – activity relationships models, J Med Chem 2006; 49(6):1953-1963.

15. Ghafourian T, Barzetar-Jalali M, Hakimiha N, Cronin MTD. Quantitative structure-pharmacokinetic relationship modeling: apparent volume of distribution, J Pharm Pharmacol 2004; 56:339-350.

16. Ghafourian T, Barzetar-Jalali M, Dastmalchi S, Khavari-Khorasani T, Hakimiha N, Nokhodchi A. QSPR model for the prediction of apparent volume of distribution, Int J Pharmac 2006; 319:82-97.

17. Berellini G, Springer C, Waters NJ, Lombardo F. In silico prediction of distribution on human using linear and nonlinear models on 669 compounds data set, J Mol Chem 2009; 52:4488-4495.

18. Demir-Kavuk O, Benzien J, Muegge I, Knapp EW. DemQSAR: predictive human volume of distribution and clearance of drugs, J Comp Aid Mol Des 2011; 25:1121-1133.

19. Del Amo EM, Ghemtio L, Xhaard H, Yliperttula M, Urtti A, Kidron H. Applying linear and non-linear methods for parallel prediction of volume of distribution and fraction of unbound drug, PLOS one 2013; 8(10):e74758.

20. Freitas AA, Limbu K, Ghafourian T. Predicting volume of distribution with decision tree-based regression methods using predicted tissue: plasma partition coefficients, J Cheminform 2015; 7:6.

21. Zhivkova Z, Doytchinova I. Prediction of steady-state volume of distribution of acidic drugs by quantitative structure – pharmacokinetics relationships, J Pharm Sci 2012; 101(3):1253-`1266.

22. Zhivkova Z, Doytchinova I. Quantitative structure – pharmacokinetics relationships analysis of basic drugs: volume of distribution, J Pharm Pharm Sci 2015; 18(3):515-527.

23. Turner JV, Maddalena DJ, Cutler DJ. Pharmacokinetic parameter prediction from drug structure using artificial neural networks, Int J Pharmac 2004; 270:209-219.

24. Ng C, Xiao Y, Putnam W, Lum B, Tropsha A. Quantitative structure – pharmacokinetic parameters relationships (QSPkR) analysis of antimicrobial agents in human using simulated annealing k-nearest neighbor and partial least square analysis methods, J Pharm Sci 2004; 93(10):2535-2544.

25. Yap CW, Li ZR, Chen YZ. Quantitative structure – pharmacokinetic relationships for drug clearance by using statistical learning methods, J Mol Graph Model 2006; 24:383-395.

36

26. Yu MJ. Predicting total clearance in humans from chemical structure. J Chem Inf Model 2010; 50(7):1284-1295.

27. Berellini G, Waters NJ, Lombardo F. In silico prediction of total human plasma clearance, J Chem Inf Model 2012; 52:2069-2078.

28. Gombar VK, Hall SD. Quantitative structure-activity relationship models of clinical pharmacokinetics: clearance and volume of distribution, J Chem Inf Model 2013; 53(4):948-957.

29. Lombardo F, Obach RS, Varma MV, Stringer R, Berellini G. Clearance mechanism assignment and total clearance prediction in human based upon in silico models, J Med Chem 2014; 57:4392-4405.

30. Zhivkova Z, Doytchinova I. Quantitative structure – clearance relationships of acidic drugs, Mol Pharmac 2013; 10:3758-3768.

31. Zhivkova Z, Doytchinova I. Quantitative structure – pharmacokinetic relationships for plasma clearance of basic drugs with consideration of the major elimination pathway, J Pharm Pharm Sci 2017; 20:135-147.

32. Paine WS, Barton P, Bird J, Denton R, Menochet K, Smith A, Tomkinson NP, Chohan KK. A rapid computational filter for predicting the rate of human renal clearance, J Mol Graph Model 2010; 29:529-537.

33. Manga NDJ, Duffy JC, Rowe PH, Cronin MTD. A hierarchical QSAR model for urinary excretion of drugs in humans as a predictive tool for biotransformation, QSAR Comb Sci 2003; 22:263-73.

34. Colmenarejo G, Alvarez-Pedraglio A, Lavandera J-L. Cheminformatic models to predict affinities to human serum albumin, J Med Chem 2001; 44:4370-4378.

35. Hall LM, Hall LH, Kier LB. Modeling drug-albumin binding with E-state topological structure representation, J Chem Inf Comput Sci 2003; 43:2120-2128.

36. Sitarama BG, Ramamurthi N, Khandelwal A. In silico ADME modeling 2: computational models to predict human serum albumin binding affinity using ant colony systems, Bioorg Med Chem 2006; 14:4118-4129.

37. Xue CX, Zhang RS, Liu HX, Liu, MC, Hu, ZD, Fan BT. QSAR models for the prediction of binding affinities to human serum albumin using the heuristic method and a support vector machine, J Chem Inf Comput Sci 2004; 44(5):1693-1700.

38. Chen L, Chen X. Results of molecular docking as descriptors to predict human serum albumin binding affinity, J Mol Graph Model 2012; 33:35-43.

39. Deeb O. Correlation ranking and stepwise regression procedures in principle components neural networks modeling with application to predict toxic activity

37

and human serum albumin binding affinity, Chem Intel Lab Sys 2010; 104:181-194.

40. Hajduk PJ, Mendoza R, Petros AM, Huth JR, Bures M, Fesik S, Martin YC. Ligand binding to domain 3 of human serum albumin: a chemometric analysis, J Comput Aid Mol Des 2003; 17:93-102.

41. Estrada E, Uriarte E, Molina E, Simon-Manso Y, Milne GWA. An integrated in silico analysis of drug binding to human serum albumin, J Chem Inf 2006; 46:2709-2724.

42. Kratochwil NA, Huber W, Mueller F, Kansy M, Gerber PR. Predicting plasma protein binding of drugs: a new approach, Biochem Pharmacol 2002; 64:1355-1374.

43. Yamazaki K, Kanaoka M. Computational prediction of the plasma protein binding percent of diverse pharmaceutical compounds, J Pharm Sci 2004; 93(6):1480-1494.

44. Liu J, Yang L, li Y, Pan D, Hopfinger AJ. Constructing plasma protein binding model based on combination of cluster analysis and 4D fingerprint molecular similarity analysis, Bioorg Med Chem 2006; 14:611-621.

45. Votano JR, Parham M, Hall LM, Kier LB, Oloff S, Tropsha A. QSAR modeling of human serum protein binding with several modeling techniques utilizing structure – information representation, J Med Chem 2006; 49(24):7169-7181.

46. Moda TL, Montanari CA, Andricopulo AD. In silico prediction of human plasma protein binding using hologram QSAR, Lett Drug Des Disc 2007; 4:502-509.

47. Ghafourian T, Amin Z. QSAR models for the prediction of plasma protein binding, Bioimpacts 2013; 3(1):21-27.

48. Zhivkova Z, Doytchinova I. Quantitative structure – plasma protein binding relationships of acidic drugs, J Pharm Sci 2012; 101(12):4627-4641.

49. Zhivkova Z. Quantitative structure – pharmacokinetics relationships for plasma protein binding of basic drugs, J Pharm Pharm Sci 2017; 20:349-359.

50. OECD principles for the validation, for regulatory purposes, of (Quantitative) structure-activity relationship models, http://www.oecd.org/chemicalsafety/risk-assessment/ 37849783.pdf.

51. OECD, Guidance document on the validation of (Quantitative) structure-activity relationship [(Q)SAR] models, 2014, http://www.oecd.org/env/guidance-document-on-the-validation-of-quantitative-structure-activity-relationship-q-sar-models-9789264085442-en.htm.

38

52. Tropsha A. Best practices for QSAR model development, validation and exploitation, Mol Inf 2010, 29:476-488.

53. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronon M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz’min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A. QSAR modeling: where have you been? Where are you going to? Med Chem 2014; 57(12): 4977-5010.

54. Consonni V, Todeschini R. Molecular descriptors. In: Recent Advances in QSAR Studies – Methods and Applications; Puzyn T, Leszczynski J, Cronin MTD, Eds; Springer: Dordrecht, Heidelberg, London, New York, etc, 2010, pp. 29-102.

55. Fourches D, Muratov E, Tropsha A. Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research, Chem Inf Model 2010; 50(7):1189–1204.

56. Cronin MTD, Schultz TW. Pitfalls in QSAR, J Mol Struct 2003; 633:39–51. 57. Stouch TR, Kenyon JR, Johnson SR, Chen XQ, Doweylko A, Li Y. In silico

ADME/Tox: why models fail? J Comput Aided Mol Des 2003; 17(2-4), 83-92. 58. Luan F, Cordeiro MNDS. Overview of QSAR modeling in rational drug design.

In: Recent Trends on QSAR in Pharmaceutical Perceptions. Khan MTH, Ed; Bentham Science Publ, 2012, 194-241.

59. Topliss JG, Costello RJ. Chance correlations in structure-activity studies using multiple regression analysis, J Med Chem 1972, 15:1066-1068.

60. Schultz TW, Netzeva TI, Cronin MTD. Selection of data sets for QSARs: analyses of Tetrahymena toxicity from aromatic compounds, SAR QSAR Environ Res 2003; 14:59-81.

61. Gedeck P, Rohde B, Bartels C. QSAR--How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets, J. Chem. Inf. Model 2006; 46:1924–1936.

62. Todeschini R, Consonni V. Handbook of molecular descriptors. Wiley-VCH:Weinheim, 2000.

63. Devillers J New trends in QSAR modeling using topological indices, Curr Op Drug Discov Devel, 2000, 3(3), 275-279.

64. Winkler DA. The role of quantitative structure – activity relationships (QSAR) in biomolecular discovery, Brief Bioinform 2002; 3(1):73-86.

65. Dudek AZ, Arodz T, Galvez J. Computational methods in developing quantitative structure-activity relationships (QSAR): A review. Combin. Chem. High Throughput Screen, 2006, 9(3), 213- 228.

39

66. Dehmer M, Varmura K, Bonchev D. Statistical modeling of molecular descriptors in QSAR/QSPR. Wiley – Blackwell: Weinheim, 2012.

67. Damale MG, Harke SN, Khan FA, Shinde DB, Sangshetti JN. Recent advances in multidimensional QSAR (4D-6D): A critical review, Mini-Rev Med Chem 2014; 14:35-55.

68. Dearden JC. The use of topological indices in QSAR and QSPR modeling. In: Advances in QSAR Modeling, Challenges and Advances in Computational Chemistry and Physics; Roy K, Ed; Springer 2017, pp. 57-88.

69. Todeschini R, Consonni V. Molecular descriptors for chemoinformatics. Mannhold R, Kubinyi H, Folkers G, Eds; Wiley-VCH: Weinheim, 2009.

70. Wiener H. Structural determination of paraffin boiling points, J Am Chem Soc 1947; 69, 17-20. [60]

71. Randic M. On characterization of molecular branching, J Am Chem Soc 1975; 97, 6609-6615. [61]

72. Randic M. In computation of optimum parameters for multivariate analysis of structure – property relationship, J Comp Chem 1991, 12(8):970-980. [62]

73. Balaban AT, A personal view about topological indices for QSAR/QSPR. In: QSAR/QSPR studies by molecular descriptors; Diude M., Ed; Huntington: New York, 2000, pp. 1-25. [63]

74. Schulz HP, Topological organic chemistry. 1. Graph theory and topological indices of alkanes, J Chem Inf Comput Sci, 1989, 29, 227-228. [64]

75. Hall LH, Kier LB. Molecular connectivity chi indices for database analysis and structure – property modeling. In: Topological indices and related descriptors in QSAR and QSPR; Devillers J, Balaban A, Eds; Gordon and Breach: London, 1999, pp. 307-360. [65]

76. Hall LH, Kier LB. Electrotopological state: structure modeling for QSAR and database analysis. In: Topological indices and related descriptors in QSAR and QSPR; Devillers J, Balaban A, Eds; Gordon and Breach: London, 1999, pp. 307-360. [66]

77. Galvez J, Garcia R, Salabert MT, Soler R, Charge indexes. New topological descriptors, J Chem inf Comput Sci 1994, 34, 520-525. [67]

78. Mulliken RS, Electronic Population Analysis on LCAO–MO Molecular Wave Functions. I J Phys Chem, 1955, 23, 1833-1840. [68]

79. Cammarata A, An Apparent Correlation between the in Vitro Activity of Chloramphenicol Analogs and Electronic PolarizabilityJ Med Chem, 1967, 10, 525-527. [69]

40

80. Zhou Z, Parr RG, Activation hardness: new index for describing the orientation of electrophilic aromatic substitution J Am Chem Soc, 1990, 112, 5720-5724. [70]

81. Klopman GJ, Chemical reactivity and the concept of charge- and frontier-controlled reactions J Am Chem Soc, 1968, 90, 223-234. [71]

82. Randic M. Generalized molecular descriptors, J Math Chem 1991; 7:155–168. 83. Dearden JC, Cronin MT, Kaiser KL. How not to develop a Quantitative Structure-

Activity or Structure-Property Relationship (QSAR/QSPR), SAR QSAR Environ Res 2009; 20:241–266.

84. Veerasamy R, Rajak H, Jain A, Sivadasan S, Varghese CP, Agrawal RK. Validation of QSAR models – strategies and importance, Int J Drug Des Discov 2011; 2(3):511-519.

85. Suárez E, Pérez CM, Rivera R, Martínez MN. Selection of variables in a multiple linear regression model. In: Applications of regression models in epidemiology, John Wiley & Sons Inc: Hoboken, NJ, USA, 2017, pp. 77-86

86. Jolliffe IT. Principle component analysis, 2nd ed. Springer: New York, Berlin, Heidelberg, etc, 2002.

87. Hou TJ, Wang JM, Liao N, Xu XY. Application og Gas on the SAR of some cinnamamides. J Chem Inf Comput Sci 1999, 39(5):775-781.

88. Yasria A, Hartsough D. Toward an optimal procedure for variable selection and QSAR model building, J Chem Inf Comput Sci 2001, 41(5), 1218-1227

89. Leardi R. Genetic algorithm in chemistry, J Chromatogr A 2007; 1158:226-233. 90. Terfloth L, Gasteiger J. Neural networks and genetic algorithms in drug design,

DDT 2001; 6(15):2001. 91. Alexander DLJ, Tropsha A, Winkler DA. Beware of R2: simple, unambiguous

assessment of the prediction accuracy of QSAR and QSPR models, J Chem Inf Model 2015, 55(7):1316-1322.

92. Tropsha A, Gramatica P, Gombar VK. The importance of being earnest: validation is absolute essential for successful application and interpretation of QSAR models, QSAR Comb Chem 2003; 22:69-77.

93. Goolbraikh A, Tropsha A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection, J Comput Aided Mol Des 2002; 16(5-6):357-369.)

94. Roy K. Om some aspects of validation of predictive QSAR models. Exp Opin Drug Discov 2007; 2:1567-1577.

95. Leonard JT, Roy, K. On selection of training and test sets for the development of predictive QSAR models, QSAR Comb Sci 2005; 25(3):235-251.

41

96. Martin TM, Harten P, Young DM, Muratov EN, Golbraikh A, Zhu H, Tropsha A. Does rational selection of training and test sets improve the outcome of QSAR modeling? J Chem Inf Model 2012; 52(10):2570-2578.

97. Kubinyi H. QSAR and 3S QSAR in drug design. Part 1: methodology, DDT 1997; 2(11):457-467.

98. Liu P, Long W. Current mathematical methods used in QSAR/QSPR studies, Int J Mol Sci 2009; 10:1978-1998.

99. Puzin T, Leszczynski J, Cronin MTD, Eds. Recent advances in QSAR studies; Springer: Dordrecht, Heidelberg, London, New York, 2010.

100. Liew CY, Yap CW. Current modeling methods used in QSAR/QSPR. In: Statistical modeling of molecular descriptors in QSAR/QSPR; Dehmer M, Varmuza K, Bonchev D, Eds; Wiley-VCH verlag GmbH&Co.KGaA: Weinhwim, 2012, pp. 1-31.

101. Patel HM, Noolvi MN, Sharma P, Jaisval V, Bansal S, Lohan S, Kumar SS, Abbot V, Dhiman S, Bhardwaj, V. Quantitative structure – activity relationship (QSAR) studies as strategic approach in drug discovery, Med Chem Res 2014; 23:4991-5007.

102. Topliss JG, Edwards RP. Chance factors in studies of quantitative structure-activity relationships, J Med Chem 1979; 22:1238–1244.

103. Vinzi VE, Chou WW, Henseler J, Wang H, Eds. Handbook of partial least squares; Springer: Berlin, Heidelberg, 2010.

104. Lek S, Guegan JF. Artificial neural networks as a tool in ecological modeling, an introduction, Ecol. Model 1999; 120:65-73.

105. Loh W-Y. Classification and regression trees. WIREs Data Mining Knowledge Discov 2011; 1:14-23.

106. Breiman L, Friedman J, Olshen R, Stone C, Steinberg D, Colla P. CART: Classification and Regression Trees, Stanford: CA, 1995.

107. Mucherino A., Papajorgji PJ, Pardalos PM. k-Nearest Neighbor Classification. In: Data Mining in Agriculture; Springer: New York, 2009.

108. Bajusz D, Rácz A, Héberger K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform 2015; 7:1-13.

109. Vapnik VN. Statistical learning theory; John Wiley & Sons: New York, 1998. 110. Christianini N, Shawe-Taylor J. An introduction to Support vector machines and

other kernel-based learning methods; Cambridge University Press:Cambridge, UK, 2000.

42

111. Herbrich R. Learning kernel classifiers: theory and algorithms; MIT Press: USA, 2002.

112. Schölkopf B, Smola A. Learning with kernels; MIT Press: Cambridge, UK, 2002. 113. Smola A, Schoelkopf B. A tutorial on support vector regression, Stat Comput

2004, 14, 199-222. 114. Hartung T, Bremer S, Casati S, Coecke S, Corvi R, Fortaner S, Gribaldo L,

Halder M, Hoffmann S, Roi AJ, Prieto P, Sabbioni E, Scott L, Worth A, Zuang V. A modular approach to the ECVAM principles on test validity, Altern Lab Anim 2004; 32:467–472.

115. Mahobia NK, Patel RD, Sheikh NW, Singh SK, Mishra A, Dhardubey R. Validation methods used in quantitative structure – activity relationship, Der Pharma Chem 2010; 2(5):260-271.

116. Wehrens R, Putter H, Buydens LMC. Bootstrap: a tutorial. Chemometr Intell Lab Syst 2000; 54:35-52.

117. Roy K, Mitra L, Kar S, Ojha PK, Das RN, Kabir H. Comparative studies on some metrics for external validation of QSPR models, J Chem Inf Model 2012; 52:396-408.

118. Netzeva TI, Worth A, Aldenberg T, Benigni R, Cronin MT, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA, Myatt G, Nikolova-Jeliazkova N, Patlewicz GY, Perkins R, Roberts D, Schultz T, Stanton DW, van de Sandt JJ, Tong W, Veith G, Yang C. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships, The Report and Recommendations of ECVAM Workshop 52. Altern Lab Anim 2005; 33:155-173.

119. Jaworska J, Nikolova-Jeljazkova N, Aldenberg T. QSAR appliciability domain estimation by projection of the training set descriptors: a review, Altern Lab Anim 2005; 33(5):445-459.

120. Weaver S, Gleeson MP. The importance of the domain of applicability in QSAR modeling, J Mol Graph Model 2008, 26:1315-26.

121. Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R. Comparison of different approaches to define the applicability domain of QSAR models, Molecules 2012; 17(5):4791-4810.

122. Sahigara F, Ballabio D, Tideschini R, Consonni V. Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions, J Cheminform 2013; 5:27.

43

123. Hanser T, Barber C, Marchaland JF, Werner S. Applicability domain: towards a more formal definition, J SAR QSAR Env Res 2016; 27(11):865-881.

124. Aniceto N, Freitas AA, Bender A, Ghafourian T. A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability – density neighbourhood, J Cheminform 2016; 8:69.

125. Magiora GM. On outliers and activity cliffs – why QSAR often disappoints, J Chem Inf Model 2006; 46(4):1535.

126. Verma RP, Hansh C. An approach toward the problem with outliers in QSAR, Bioorg Med Chem 2005; 13(150):4597-4621.

127. Cao DS, Liang YZ, Xu OS, Li HD, Chen X. A new strategy of outlier detection for QSAR/QSPR, J Comput Chem 2010; 31(3):592-602.

128. Nahum OE, Yosipof A, Senderowitz H. A multiple – objective genetic algorithm for outlier removal, J Chem Inj Model 2016; 55(12):2507-2518.

129. Gramatica P. Principles of QSAR models validation: internal and external, QSAR Comb Sci, 2007; 26(5):694-701.

130. Furusjo E, Svenson A, Rahmberg M, Anderson M. The importance of outlier detection and training set selection for reliable environmental QSAR predictions, Chemosphere 2006; 63:99-108.

44

Chapter 2

QSPkR modeling of the steady state volume of distribution

Volume of distribution

The volume of distribution (Vd) is a key PK parameter used to relate the amount of the drug in the body (immeasurable variable) with its plasma concentration (measurable variable). It represents the volume in which a drug would be distributed if its concentration was the same as the concentration measured in plasma. Therefore, Vd is an apparent PK parameter; it rarely corresponds to a real physiological spaces and can be much higher than the total body water volume.

As plasma concentration can be measured after different routes of administration and at different time, several types of Vd have been defined. A detailed discussion on the meaning and clinical use of the different types of Vd has been published [1].

All volumes of distribution can be calculated as a ratio between the amount of the drug in the body (A) and plasma concentration (C) at given time.

CA

Vd =

Immediately after iv bolus administration, the drug is subjected to distribution and elimination. For drugs with one compartment distribution Vd usually remains constant. For drugs with multi-compartment distribution Vd increases gradually until a distribution pseudo-equilibrium is reached. The initial Vd corresponds to the volume of the central compartment (Vc), and is equal to the ratio between the iv administrated dose D and the initial plasma concentration C0 (at t = 0):

0c C

DV =

The Vd during the terminal elimination phase, after reaching distribution pseudo-equilibrium, is denoted as Varea. It can be calculated as follows:

∞−β=

β=

0area AUC

DCLV

45

where CL is the plasma clearance of the drug, β – the elimination rate constant determined from the terminal linear part of the logarithmic concentration vs time curve (lnC/t), D is the iv administrated dose, and AUC0-∞ – the area under the curve C/t from t = 0 to t = ∞.

In the case if iv infusion or multiple drug administration, in a period of 4-5 half-lives the rate of administration becomes equal to the rate of elimination, and a steady-state is reached. The Vd defined in this state represents the steady-state volume of distribution Vss. It is independent on drug CL and can be defined in physiological terms as follows:

i,tipss VKVV ∑+=

where Vp is the volume of plasma, Ki – the tissue-to-plasma partition coefficient of the ith tissue, and Vt,i – the physiological volume of the respective tissue.

Vss is considered as the most accurate measure of distribution as it is not directly dependent on drug CL. Therefore, it is involved in the computation of both maintaining dose (DM) and loading dose (DL) in multiple drug administration [2]:

F

)CC(VD

ssmin

ssmaxss

M

−=

F

CVD

ssmaxss

L =

where Cssmin and Css

max are the minimum and maximum concentrations at steady state, and F – the bioavailability.

Vss (together with the CL) determines drug half-life, and it is also proportional to the mean residence time of the drug in the body. Thus, it can influence both dosing frequency and duration of drug action.

Vss reflects the ability of the drug to cross biological membranes and to distribute into tissues. It is crucially dependent on the free fraction of the drug in plasma (fu) and tissues (fu.t):

tt,u

upss V

ff

VV +=

46

Although majority of drugs are distributed via passive diffusion through membranes, active transport mediated by transporters may have significant impact on Vss [3, 4].

Methods for prediction of Vd

The great clinical importance of Vd and the recognized need of optimization of ADME properties of drug candidates at the very early stages of drug discovery inspired an extensive research and development of numerous approaches for prediction of Vd. Their principles and utility, advantages and shortcomings are discussed in detail in several reviews [5 – 13].

The methods used to predict the Vd can be classified into three groups: 1. Methods based on in vivo data from preclinical species; 2. Methods, based on in vitro data; 3. In

silico methods. All methods are based on the assumptions that the active transport through membranes in negligible and the dissociation of the drug from the binding sites in both plasma and tissues is rapid [7].

The best-described technique for prediction of human Vd from in vivo preclinical data is allometric scaling [14, 15]. It consists of establishment of relationships between Vd values for different animal species and their body weight, and extrapolation to humans. Caldwell et al. derived a simple linear relationship between Vss of human and rat for 144 compounds which allowed prediction of human Vss with a mean fold error of prediction (MFEP) 1.82, maximum error 9, and accuracy 65% (defined as percent of compounds predicted within the 2-fold error of the experimental value) [16]. A detailed study based on allometric scaling from rat, dog and monkey – alone and in combinations, reported models for prediction of human Vd with accuracy 62-70% and maximum error < 10 [17]. The main drawback of the upper methods is that they do not take into account the interspecies differences in tissue and plasma binding – the major determinant of Vd. A model with improved performance was reported by adding data for the unbound tissue volume (Vu,T) which allowed prediction of Vd with MFEP 1.4, maximum error 2.3 and accuracy 73% [18]. Obach et al. compared four methods for Vss prediction based on animal data [5]. Three of them, considering the interspecies differences in plasma and tissue binding, showed very good predictability with MFEP in the range 1.56 – 1.83 and accuracy 77 – 88%. In contrast, the simple allometric scaling gave MFEP of 2.78 and accuracy 53%. In general, prediction of Vss based on in vivo animal data is simple and accurate, but less effective (being time consuming, labor intensive and ethic sensitive).

47

Prediction of Vss based on in vitro data significantly reduces the experimental costs. Most of the methods are based on tissue binding data, or on tissue-to-plasma partition coefficients. This approach is rationale as it takes into account several physiological variables and leads to mechanistically interpretable models, so called physiologically based (PBPK) models. Bjorkman et al. proposed a model based on data for tissue-to-plasma partition coefficients from rats, unbound fraction of the drug in plasma for humans, organ weight and blood flow [19]. The model showed excellent predictivity on the two tested drugs (midazolam and theophylline) however it is not very reliable due to the very sparse data and too many assumptions made. Poulin and Theil applied similar approach, using predicted tissue-to-plasma coefficients, and reported a well performing model with MFEP 1.06, maximum error 2 and accuracy of 80% [20]. Further models were developed for basic drugs [21] and for acidic, neutral and zwiter-ionic drugs [22] which showed similar accuracy (65 – 68%).

In silico methodology gains increasingly popularity and utility for ADME prediction during the last two decades. It allows development of predictive models based solely on calculated molecular descriptors instead of the time and labor consuming animal or in vitro experiments. It also enables rapid screening of large databases of potential drug candidates and even predictions on virtual compounds, before chemical synthesis. It is able to nominate drug candidates with acceptable (if not ideal) PK properties.

No one methodology can be considered as the ideal – each one finds their greatest applicability at different stages of the drug discovery process. In silico methods can be effective in early stages of drug discovery as they are able to deal with large number of compounds and make predictions without any experimental cost. They are most suitable for high-throughput screening of compounds which are at an early stage of development. The methods based on in vivo preclinical species are considered as the most accurate but less effective (time-consuming, labor-intensive and ethic-sensitive). Therefore, they should be applied at latest stages of drug discovery, on limited number of compounds, nominated as drug candidates with acceptable PK properties. Mechanistic based PBPK models are particularly useful for whole-body PBPK modeling and offer the best mechanistic interpretation.

QSPkR models for Vss – an overview.

One of the most widely used in silico approaches is quantitative structure activity/pharmacokinetic relationship (QSAR/QSPkR) modeling. The goal of QSPkR is

48

development of mathematical relationship between particular PK parameter and selected molecular features, encoded in molecular descriptors. Ideally, calculated molecular descriptors are used, however experimental descriptors may also be included in the descriptor set. QSPkR modeling is of both theoretical and practical use. It allows prediction of PK parameters of new drug candidates as it relays on the similarity principle, i. e. drugs with similar structure have similar PK characteristics. QSPkR models also may reveal the main structural features responsible for the examined PK parameter or underlying processes.

Considerable progress has been made during the last two decades in development of QSPkR models for Vss prediction. The early QSPkR models for Vd were developed on small datasets and used different or types of Vds, estimated after different routes of administration. Now it is generally accepted that QSPkR modeling should be based on data for Vss following iv administration as it is the only route ensuring a defined amount of the drug subjected to distribution, thus avoiding the uncertainty associated with incomplete bioavailability [23].

Turner et al. generated QSPkRs for five PK parameters on a dataset of 20 cephalosporines, using artificial neural networks (ANN) [24]. The authors claimed high predictivity, although it was tested only on few molecules. The octanol – water partition coefficient logP, the molecular volume and the presence of amine groups showed positive correlation to Vss, while the presence of amide groups and, surprisingly, partition coefficient oil – water (logKow) disfavored Vss. Several charge indexes was found to have positive or negative effect on Vss, however they were not interpreted. Almost the same dataset (extended to 23 cephalosporines) was used for development of QSPkRs for Vss and CL by means of PLS and multiple linear and non-linear regression [25]. The fraction bound to plasma proteins was identified as a major factor with negative effect on Vss, together with molecular volume and logP. Ng et al. developed QSPkRs for Vss and CL applying kNN and PLS on a dataset of 44 antimicrobial drugs [26]. Highly predictive kNN model for Vss with external test set predictive coefficient r2

pred = 0.80 and 93% accuracy was developed, however the interdependence of the descriptors hindered the adequate interpretation of the model.

Wajima et al. (2002) proposed QSPkR models for Vss applying multiple linear regression (MLR) and partial least square (PLS) to a dataset of 64 diverse compounds [27]. QSPkRs for human Vss were derived using as descriptors data for Vss of rat and dog – alone, or in combination with traditional molecular descriptors. The best PLS

49

models showed comparable preditivity in both cases: leave-one-out cross validated correlation coefficient q2 0.830 vs. 0.844 and accuracy of prediction 78.1% vs. 76.61%. I. e. molecular descriptors had very small (if any) contribution to Vss. Slightly better performance than the allometric scaling method (q2 = 0.718 and accuracy 72%) was observed.

ANN was used for modeling of four PK parameters on a dataset of 62 structurally diverse drugs [28]. Very good agreement between observed and predicted Vss values was suggested for a limited test set of 6 compounds. The number of O-atoms, connectivity indices and lipophilicity parameters were identified as important for Vss, however their effect on Vss was not clarified

Lombardo et al. presented three approaches for prediction of Vss. The first one was based on a preliminary prediction of the free fraction of drugs in tissues (fu,t) by regression analysis of 64 neutral and basic drugs [29]. logD7.4, the fraction ionized at pH 7.4 (fi) and the free fraction in plasma (fu) were used as molecular descriptors. The Vss values were further calculated via Oie – Tozer equation [30]. The model demonstrated comparable accuracy to the conventional methods based on animal in vivo data. The results were confirmed in a further study on 120 neutral and basic drugs which demonstrated predictive accuracy of 72% for the training set and 56% for the external test set [31]. An extended model using 31 descriptors including lipophilicity and ionization parameters, molecular volume and several molecular fragments was developed on a larger training set of 384 diverse molecules. The performance was essentially identical to the animal based approaches [32].

Gleeson et al. reported the first entirely in silico approach for prediction of human and rat Vss applying three statistical methodologies: Bayesian neural networks (BNN), classification and regression tree (CART) and PLS [33] The human model demonstrated good predictivity on an independent external test set with r2

pred = 0.60.

Ghafourian et al. published two studies on QSPkR modeling of Vd. The first report considered 70 molecules of all ionization types, divided equally into training and test set [34]. Step wise regression (SWR) was applied for modeling logVd or logVd,u. (Vd,u = Vd/fu and represents the apparent volume of distribution, corrected for PPB). The models for Vd,u showed better goodness of fit, but lower predictivity in terms of MFEP. Several QSPkR models were developed: for the total training set (MFEP 2.03, accuracy 63%), for acidic drugs set (MFEP 1.79, accuracy 62%) and for basic drugs set (MFEP 1.84, accuracy 50%). All models outlined lipophilicity (expressed as logP or logD7.4 as

50

the most important factor with positive effect on Vd. The Vd of acidic drugs was further favored by molecule density (MW/V) and disfavored by the lowest atomic charge in the molecule Q-. Descriptors with negative contribution to Vd were log D1 (in the model for all molecules), the dipole moment (in the models for all and for basic molecules), and log(fu/pKa) (in the model for bases).

An improved QSPkR model was built on a dataset of 129 molecules [35]. Genetic algorithm (GA) and SWR were used for variable selection and model development, and the predictivity of the model was assessed by leave-many-out cross-validation (LMO-CV) and external validation. Significant and robust models were developed for the whole dataset, for acids and bases with MFEP in the range 1.79 – 2.17 and accuracy 62 – 71%. According to the model for the whole dataset, Vd was favored by several lipophilicity descriptors – (logP)2, Lipole and the sum of electrotopological states (E-state) indices for all >CH- groups in the molecule, as well as by the fraction ionized as a base at pH 7.4 (fi,B) and sum of E-state indices for all F-atoms in the molecule. Negative effect on Vd had the fraction ionized as an acid at pH 7.4 (fi,A) and the absolute most negative electrostatic potential on the solvent accessible surface of the molecule ESP-. The Vd of acids appeared to be positively affected by logD7.4, the number of six-membered aliphatic rings (N6-rings) and the 7th order chain connectivity index 7χch, while fi,A and the sum of the E-state indices of all >N- atoms in the molecule disfavored Vd. The model for bases did not reveal any traditional lipophilicity parameter, however the authors supposed that the effect of the lipophilicity was reflected by two other descriptors – ABSQon (the um of the atomic charges of N- and O-atoms in the molecule) and 1κa (the 1st order kappa alpha index). Vd of bases was disfavored by log(fu/pK) (in accordance with the previous report), the number of OH-groups (NOH) and ESP-. However, in contrast to the previous model, a positive effect of the dipole moment was found.

Berelini et al. presented linear and non-linear models for Vss using PLS and

random forest (RF) [36]. The models were built on a dataset of 669 drugs and drug-like substances compiled by Obach et al. [37]. Leave-class-out cross validation showed different predictivity on the various therapeutic classes with mean geometric fold error of prediction (MGFEP) 1.8 and 2.0 and accuracy 67% and 69% for PLS and RF, respectively. The models were tested on an external set of 29 molecules and showed very similar MGFEP (1.9 and 2.0) and accuracy (66% and 69%). According to the best PLS model, the most important molecular features for Vss were several lipophilicity descriptors: logD7.5, logD8, logD9 and logD10 with the largest positive effect, and logD5

51

and logD6 – with the largest negative impact. Two pharmacophoric descriptors – with high positive (DRDRDO) and high negative (DRDRAC) contribution to Vss were also defined. They represent triplets between two hydrophobic and one hydrogen bond donor (DRDRDO) or acceptor (DRDRAC). DRDRDO has higher values for protonated bases, and DRDRDO – for ionized acids. Another descriptor with detrimental effect on Vss was FLEX – the maximum flexibility of the molecule which can be interpreted as reduced ability for trans-membrane diffusion.

Demir-Kavuk et al. proposed a predictive modeling approach which combined feature generation, feature selection, model building and control of the over-fitting into a single platform, DemQSAR (available on the webpage of DemPRED) [38]. DemQSAR was able to generate models for Vss with very few descriptors while maintaining high predictive power. 584 compounds from Obach’s dataset were used. Their structures were described by two groups of descriptors: 3,642 comersial deascriptors and 3,843 – from Chemistry development kit (CDK) library. The best model with 27 descriptors demonstrated very good performance on an external test set: GMFE = 1.81 and r2

pred = 0.53. Among the commercial features strong positive correlation with human Vss were found for the cationic fraction at pH7: “pKaMB and VS-Descriptor LgD9/10 (distribution coefficients at pH 9 and 10). Strong negative correlations were observed for anionic fraction at pH7 and a number of hardly interpretable fingerprints. With respect to CDK features, strong positive correlations were observed for various substructures: carboxylic ester, primary aliphatic amine, quaternary aliphatic ammonium. Negative correlations were found with TopoPSA (topological polar surface area based on fragment contribution) and Khs.DO (the number of double bonded O-atoms, i. e. =O).

Del Amo et al. applied linear PLS regression and non-linear recursive partitioning (RP) classification for parallel prediction of Vss and fraction of unbound drug in plasma (fu) on a part of the Obach’s dataset [23]. The best PLS models showed moderate predictivity on an external test set with r2

pred 0.50 – 0.52, MFEP 2.60 – 2.64 and accuracy 58 – 59%. The RP classification method allowed reasonable successful classification of compounds with low Vss < 0.3L/kg (71% true) and high Vss > 1L/kg (81% true), while only 32% of the compounds with moderate Vss were correctly classified. The most important features for both Vss and fu were lipophilicity and hydrophobicity encoded in several descriptors. Solubility profiling coefficients (L3LgS), partition coefficient between n-octanol and water (logPn-oct), distribution coefficients at pH 9 and pH10 (LgD9 and LgD10) and the percent ionized at pH10 (%FU10) were found to have largest positive contribution to Vss, while the solubility at pH 10 was the

52

most detrimental feature, followed by hydrophilicity volume (W1), solubility (logS), intrinsic solubility (SOLY), etc. It was also noticed, that lipophilicity descriptors had high positive impact on Vss and high negative effect on fu. Reversely, the charge and solubility descriptors affected negatively Vss and positively – fu. However, a high value of fu does not inevitably lead to a high value of Vss as the relationship between Vss is rather complex.

Freitas et al. reported several decision tree (DT) regression methods for prediction of Vss of 604 compounds, part of Obach’s database [39]. An attempt to develop a more accurate QSPkR model was made combining theoretical molecular descriptors with predicted tissue:plasma partition coefficients (Kt:p) in 12 tissues. The involvement of logKt:p is reasonable as it enables a clear mechanistic interpretation of the model. However, it didn’t lead to a considerable improvement of the accuracy of prediction as GMFEP of the model built with and without logKt:p were 2.29 and 2.33, respectively. The most relevant descriptor for predicting Vss, with a large positive contribution, was logKskin:plasma. Another positive descriptor was PEOE_VSA+0 – the sum of van der Waals surface area of atoms with close to neutral partial atomic charge. Two lipophilicity descriptors had detrimental effect on Vss: SlogP_VSA7 and SLogP_VSA4 – the sum of accessible van der Waals surface area for atoms with contribution to logP (0.25, 0.30) and (0.1, 0.15). The first had high values for drugs with large resonance area (more than one aromatic ring) containing acidic groups as –COOH; the second – for larger molecules with many heteroatoms, aliphatic carbons and –COOH groups. The negative effect of the presence of heteroatoms is encoded also in descriptors ICM and a_ACM. Another negative contributors are fi,a – the fraction ionized as an acid at physiological pH, and ASA- – the surface area of negatively partially charged atoms, a parameter similar to PSA, which is known to have a negative effect on membrane permeability [40]. According to the model, drugs with high Kslin:plasma values, with fewer heteroatoms and prevailing neutral, non-polar atoms, and without strong acidic groups in the molecule, were expected to have high Vss values.

Despite of the huge number of descriptors used in QSPkR model development, most models for Vss contain mainly parameters, characterizing drug lipophilicity (logP, logD at different pH), solubility (logS) and ionization state of the molecules (pKa, fA, fB). There is an agreement, that lipophilicity affects positively Vss, while solubility has detrimental effect. A positive correlation between Vss and logD7.4 was reported for neutral and basic drugs [41] as well as between Vss and logP for bases [37]. However lipophilicity parameters seems to be very crude predictor as there are many examples

53

with comparable Vss values and 4-5 fold difference in logP or logD. The fraction ionized as a base at pH 7.4 also contributes positively to Vss, while the fraction ionized as an acid decreases it. Most of the descriptors in the published models discriminate between acidic and basic drugs, and to not provide any information about the structural features governing drug distribution. According to the models, acidic drugs should have low Vss, and basic drugs – high Vss, which is in agreement with the observations. Vss reflects the ability of the drug to cross membranes and to bind in tissues. The bases have high affinity to membrane phospholipids due to interactions between the drug cationic center(s) and the phospholipide acidic groups. They bind to alpha-1-acid glycoprotein and albumin and also accumulate by ion trapping into lysosomes – all these contributing to their large Vss [42]. Acids bind with high affinity to albumin; they are in general less lipophilic and ionized at pH 7.4, which disfavors their distribution in tissues and localize them in plasma.

Obviously, acidic and basic drugs follow different distribution patterns and it is reasonable to construct separate QSPkR models with respect to the ionization state. Such models may distinguish the main structural features governing distribution and Vss of acids and bases, and may provide better predictivity performance. The only reports on separate QSPkR modeling of Vss [34, 35] clearly demonstrated that Vss of bases and acids depends on different molecular features. The separate models had lower predictability as compared with the models for the whole dataset which may be due to the limited size of the datasets, inconsistent separation of the classes and unspecified measure of Vd as an end-point variable.

Here we summarize the results from QSPkR modeling of Vss on larger datasets of acidic and basic drugs, based on data for Vss after iv administration in human [43, 44]. The models are generated in accordance with the best practices in QSAR modeling. A special attention is paid to the interpretation of the descriptors in the models and analysis of the outliers. Interpretation of the descriptors reveals the major molecular features governing drug Vss. Analysis of the outliers may provide helpful information about the mechanisms underlying Vss as well as about the applicability domain of the model. It may also be used for curation of the database.

QSPkR modeling of Vss of acidic and basic drugs

The models were generated in a common workflow involving five steps: 1. Preparation of the dataset, 2. Computation of descriptors and variable selection, 3. Development of

54

QSPkR models, 4. Validation of the QSPkR models, and 5. Evaluation of the predictive ability of developed models.

Preparation of the datasets

The QSPkR models were developed on datasets, extracted from Obach’s database [37].The drugs were classified according to the ionization type on the basis of the fraction ionized as an acid (fA) and as a base (fB) at physiological pH 7.4:

( )4.7pKA a1011

f−+

=

( )apK4.7B 1011

f−+

=

The mol-files of the drugs were drawn and verified using public databases: Drug Bank [45], Chemical Book [46], Japan Chemical Substance Dictionary [47] and ChEBI [48]. The pKa values were calculated by ACD/LogD software version 9.08 software (Advanced Chemistry Development, Inc., Toronto, Ontario, Canada). In the case of more than one acidic or basic center, the pKa of the strongest one was taken into account. A drug was considered as an acid, if it met one of the following conditions: 1. fA > 0.1 while fB ≈ 0, and 2. fA >> fB. Drugs with fB≥0.02 and fA = 0 were defined as bases. Experimental values of Vss were logarithmically transformed before analysis in order to ensure close to normal distribution – hence, logVss was used as a dependent variable.

Computation of descriptors and variable selection

The chemical structures of the drugs were described by 178/179 molecular descriptors calculated by ACD/LogD version 9.08 and MDL QSAR version 2.2 (MDL Information systems, Inc., San Leonardo, California). They included descriptive properties (number of atoms, groups or bonds of given type, rings, circles, hydrogen bond donors, and acceptors, etc.), electrotopological (E-state) and molecular connectivity indices, encoding significant topological features of the whole molecule [49, 50], global physicochemical 2D properties (MW, logP, logD7.4, PSA), and 3D features (dipole moment, polarizability, volume, surface, etc.)

The most significant descriptors were selected via genetic algorithm (GA), implemented in MDL QSAR package. GA is a heuristic search algorithm for function

55

minimization inspired by the mechanism of evolution of species [51]. Default values for the size of initial population (32), choice of parents (tournament selection), type of crossover (uniform) and mutation (one-point), and fitness function (Friedman’s lack-of-fit scoring function with two parameters) were used.

Development of QSPkR models

The QSPkR models were derived by a stepwise multiple linear regression. It was applied in a forward mode using Fisher test with defined F-to-enter (4.00) and F-to-remove (3.99). The goodness of fit was assessed by the explained variance (r2) and standard error of estimate (SEE) according to the equations:

( )

( )∑

∑

=

=

−

−

−= n

1i

2mean,obs,ssi,obs,ss

n

1i

2i,calc,ssi,obs,ss

2

VlogVlog

VlogVlog1r

( )

1pn

VlogVlogSEE

n

1i

2i,calc,ssi,obs,ss

−−

−

=∑

=

where Vss,obs,i and Vss,calc,i are the observed and calculated by the model values of Vss for

the ith drug, Vss,obs,mean is the average value of Vss, n – the number of drugs in the training

set, and p – the number of molecular descriptors in the model. Fisher statistic (F) for the

models was also calculated. Drugs which residuals did not follow normal distribution

were considered as outliers and were removed from the training set before building of

the final model.

Validation of QSPkR models

The models were validated by internal leave-one-out cross-validation (LOO-CV) and leave-many-out cross-validation (LMO-CV). In LOO-CV one drug was excluded from the training set, a QSPkR model was derived on the remaining (n-1) molecules and was used for prediction of logVss of the excluded drug. The procedure was repeated for all drugs in the training set. In the LMO-CV a subset of molecules (usually 20%) was successively excluded from the training set, a model was built on the remaining

56

molecules, and tested on the excluded subset. The predictivity was assessed by the cross-validated coefficients q2

LOO-CV and q2LMO-CV, calculated according to the equation:

( )

( )∑

∑

=

=

−

−

−= n

1i

2

trainmean,obs,ssi,obs,ss

n

1i

2

traini,pred,ssi,obs,ss2

VlogVlog

VlogVlog1q

where Vss,pred,i is the predicted by the model value of Vss for the ith drug. In LMO-CV the average value of all CV runs was given as q2

LMO-CV.

A few additional metrics such as fold error of prediction (FEP), mean fold error of prediction (MFEP) and geometric mean fold error of prediction (GMFEP) were computed according to the equations:

i,pred,ssi,obs,ss VlogVlog10FEP

−=

n

10MFEP

n

1i

VlogVlog pred,ssobs,ss∑=

−

=

n

VlogVlogn

1ii,pred,ssi,obs,ss

10GMFEP

∑

==

−

The accuracy of prediction was given as a percent of drugs from the test set which Vss values were predicted within the 2-fold or 3-fold error of the observed values.

Evaluation of the predictive ability of QSPkR models

Predictive ability of the models was evaluated using an external test set of compounds not involved in any step of model development or validation, as recommended by Tropsha [52]. The goodness of prediction was assessed by the predictive coefficient of determination r2

pred:

( )

( )∑

∑

=

=

−

−

−= n

1i

2

testmean,obs,ssi,obs,ss

n

1i

2

testi,pred,ssi,obs,ss2pred

VlogVlog

VlogVlog1r

57

The values of FEP, MFEP, GMFEP and Accuracy were estimated for the external test set as already described.

QSPkR model for Vss of acidic drugs

The model was built on a training set of 123 molecules, covering a wide range of structures, therapeutic action, physicochemical properties and PK properties. The MW varied between 126 and 1297 g/mol, with an average value of 372 g/mol. Two drugs (micafungin and suramin) had MW higher than 1200 g/mol. The values of the lipophilicity parameters logP and logD7.4 also varied considerably. LogP ranged between -7.48 (micafungin) and 8.39 (hipericin); the same drugs had extreme values of logD7.4. The values of Vss ranged from 0.04 (suprofen) to 15 L/kg (artesunate), with an average value of 0.541 L/kg and median value of 0.22. LogVss displayed closer to normal distribution. It ranged between -1.40 and 1.176, with an average value -0.576 and median value -0.658. The histogram of Vss is shown in Figure 1. Most of the acidic drugs have low Vss values: 6% with Vss < 0.1 L/kg, 37% with Vss between 0.1 and 0.2 L/kg and 28% with Vss in the range 0.2 - 0.3 L/kg. For only 10.6% Vss > 0.7 L/kg (the value, considered as total body water volume [37], and only 3.8% show Vss > 2 L/kg.

Figure 1. Histogram of Vss for acidic and basic drugs

Several QSPkR models were generated using different combinations of descriptors. The best one in terms of goodness of fit and cross-validation is given below:

( ) ( ) ( )( ) ( ) ( )( ) ( ) 38.0knotpv*03.017.0H*07.041.0

acnt_SdssS*16.081.0acnt_SssS*03.012.0G*01.005.0

SHsSH*10.0*35.0acnt_SdsssP*06.053.09xch*82.301.28Vlog

max

min

ss

+±−±−

−±−±−±+

+±+±+±=

58

where xch9 is the 9th order simple connectivity chain index, SdsssP_acnt, SssS_acnt and SdssS_acnt – the number of all dsssP-, ssS-, and dssS-atoms in the molecule, respectively; SHsSH – the sum of the electrotopological state (E-state) indices of H-atoms in SH-groups; Hmax – the maximum hydrogen E-state in the molecule; Gmin – the minimum E-state atom in the molecule; knotpv – the difference between connectivity valence cluster 3 (xvc3) and valence path/cluster 4 (xvpc4) indices.

The QSPkR model for Vss of acidic drugs is robust and significant with r2 = 0.661, SEE = 0.194 and F = 28.32. Seven drugs are identified as outliers.

For LMO-CV the training set was divided randomly into training and test sets (4:1). A model was derived on the training set, and tested on the test set. The average value of 30 runs was given as q2

LMO-CV. The model demonstrated good predictive power: q2

LOO-CV = 0.581; q2LMO-CV = 0.540, MFEP = 1.58, accuracy (at 2-fold error level) 83%.

The predictivity of the model was further assessed with an external test set of the ten acidic drugs from the external test set of Berelini et al. [36] The model demonstrated good performance with MFEP 2.02, GMFEP 1.78 and accuracy 50% (at two-fold-error level) and 80% (at three-fold error level). The plot of the predicted versus observed logVss values is shown in Figure 2.

Figure 2. Predicted vs. observed logVss values for the training and external test set of

acidic drugs.

Analysis of the descriptors in the model allows identifying the main structural features affecting Vss of acidic drugs. Descriptors xch9, SdsssP_acnt, SHsSH and Gmin have positive correlation with Vss. xch9 represents the 9th order simple connectivity

59

chain index and encodes the presence and characteristics (type and position of substituents) of a nine-member cycle. The same descriptor was found to have positive contribution to the biliary clearance [53]. Extensive secretion in bile can lead to high Vss values. SdsssP_acnt represents the number of all atoms of the type =P≤ in the molecule. Phosphonate groups are rather polar and ionized at physiological pH which disfavors trans-membrane permeability. Their positive contribution to Vss should be related to specific binding to various tissues. For example, the bisphosphonates pamidronate (Vss = 1.8 L/kg), risedronate (Vss = 6.3 L/kg) and zoledronate (Vss = 0.63 L/kg) are effectively used for preventing osteoporosis and bone metastases due their high affinity to bone [54]. Fludarabin (Vss = 2.2 L/kg) binds extensively in kidney, liver, spleen and other tissues [55]. SHsSH counts the sum of the electrotopological state (E-state) indices of H-atoms in SH-groups. Sulfhydryl groups are capable of forming hydrogen and disulfide bonds with tissue proteins. For example, high binding of the unchanged acetylcystein through disulfide bridges has been reported in lung (47%, vs. 14% in plasma) [56]. Gmin represents the minimum E-state value in the molecule and is a quantitative measure for the most electrophilic atom. The E-state value provides information about the electron accessibility to the atom. This descriptor has low, usually negative values for C-atoms, connected with large number of strong electronegative atoms (like -CF3, –C(SO2NH2), -C-SO3H, etc.). In addition, the drugs with the most negative value of Gmin (micafungin, suramin, rizedronate, pamidronate, zoledronate) are highly hydrophilic with logD7.4 < -6.

According to the model, descriptors SssS_acnt, SdssS_acnt, Hmax and knotpv have negative correlation with logVss. SssS_acnt and SdssS_acnt represent the number of all -S- and =S< atoms in the molecule. The latter are usually represented as >S=O groups, which are often involved in the binding of drugs to HSA [57]. Descriptor Hmax corresponds to the maximum hydrogen atom E-state in the molecule and reflects its ionizability. It has low values for weak acids and highest values for strong acids with pKa < 4. The detrimental effect of Hmax is in agreement with the general belief that ionized acids have low Vss. Descriptor knotpv represents the difference between two connectivity indices: xvc3 and xvpc4. Molecular connectivity indices can be interpreted in terms of intermolecular accessibility [58]. Low value of knotpv corresponds to high electron accessibility. For small compact molecules the values of knotpv is close to zero. For large molecules with a high degree of branching, distal substituents or bridged rings the value of knotpv is negative, with the lowest value in the dataset (-3.46). Such large molecules face difficulties in crossing membranes, however they are rich in hetero-atoms and HB donors/acceptors which are potential participants in drug-tissue proteins

60

interactions. The negative contribution of knotpv to logVss reflects the positive effect of intermolecular accessibility to Vss. The QSPkR model does not involve any lipophilicity parameter. Instead, hydrophilicity affects negatively Vss (through descriptors Gmin and Hmax). Plasma protein binding also contributres negatively to Vss (through SssS_acnt and SdssS_acnt), while specific affinity to tissues (encoded by SdsssP_acnt, SHsSH and xch9) and intermolecular accessibility (encoded by knotpv) are favorable for Vss.

Seven drugs were identified as outliers from the model. Three of them are extensively distributed drugs, and their Vss values are highly under-predicted. Artesunate is the drug with the highest Vss in the dataset. A median value of 15L/kg was reported, ranging between 2.2 and 39 L/kg [59]. The predicted by the model Vss value of 2.49 L/kg is within the range of the observed values. The large inter-individual variability of artesunate PK parameters is due to its highly variable metabolism. Aditionally, artesunate undergoes biliary excretion and enterohepatic circulation [60] which may also contribute to the larger observed Vss values. The value of Vss for ifetroban in Obach’s database is 4.4 L/kg while the predicted Vss is 0.42 L/kg. The drug is extensively distributed throughout the body with concentration in most tissues exceeding that in plasma, and considerable enterohepatic circulation [61]. Rosuvastatin (observed and predicted Vss values 1.7 L/kg and 0.17 L/kg, respectively) is eliminated primarily through hepato-billiary system (about 70%) with a high hepatic extraction coefficient [62] and considerable biliary excretion [63]. The complex elimination pathway may be the reason for the large difference between observed and predicted Vss values for artesunate, ifetroban and rosuvastatin.

According to the Obach’s database, suprofen (Vss 0.04 L/kg, predicted 0.22 L/kg), indomethacin (Vss 0.096 L/kg, predicted 0.62 L/kg) and glyburide (Vss 0.08 L/kg, predicted 0.32 L/kg) have low Vss values, hence they are over-predicted by the model. It is known that non-steroidal anti-inflammatory drugs have low Vss values restricted by their high PPB [36]. Plasma volume is close to the volume of distribution of albumin, or about 0.1 L/kg [37] and this value could be considered as a lower limit of Vss. Controversial data for Vss of these drugs have been published: for suprofen – between 0.1 and 0.3 L/kg [64]; for indomethacine – 0.93 L/kg [65] and 0.95 L/kg [35]; for glyburide – 47 L (0.47 L/kg) and 56.8 L (0.78 L/kg) in obese and non-obese individuals, respectively [66]. The predicted Vss values are within the range of the reported values, or at least within the 2-fold error. Zoledronate was also over-predicted by the model (Vss 0.6 L/kg, predicted 2.18 L/kg). Zoledronate has similar structure and physicochemical properties to the other two biphosphonate drugs in the database – pamidronate (Vss 1.8

61

L/kg) and risedronate (Vss 6.3 L/kg). They all bind rapidly and extensively to bones from where they are gradually released for a long time period [67]. A Vss value of 0.6 L/kg seems unrealistic as it is close the total body water volume and is in contradiction to the common distribution patterns of this class of drugs [68]. Therefore, the disagreement between the observed and predicted Vss values of the four drugs may arise from incorrectness of the reported PK data.

QSPkR model for Vss of basic drugs

The dataset consisted of 216 basic drugs from Obach’s database [37] covering wide chemical and PK response space. The MW varied between 129 and 1431 g/mol, with an average value of 360 g/mol. Four drugs (cetrorelox, valspodar, leuprolide and cyclosporine) had MW exceeding 1200 g/mol. The values of the lipophilicity parameters logP and logD7.4 also varied considerably. LogP ranged between -5.82 (desmopressin) and 8.89 (amiodaron); the same drugs had extreme values of logD7.4. The value of fB at pH 7.4 was between 0.02 and 1.00 with 60% of the drugs almost completely ionized (fB > 0.95). The values of Vss varied between 0.073 (netilmicin) and 140 L/kg (chloroquine), with an average value of 6.08 L/kg and median value 2.5 L/kg. LogVss showed close to normal distribution ranging between -1.14 and 2.15 (mean value of -0.57 and median value of -0.658). The histogram of Vss values for basic drugs is also shown in Fig. 1. Obviously, acidic and basic drugs differ considerably in their distribution patterns. Very few basic drugs have low Vss < 0.3 L/kg; for the majority (87.5%) Vss exceeds the threshold of 0.7 L/kg; for 56% Vss > 2L/kg and 15.3% manifest extreme high Vss > 10L/kg.

The dataset was divided into a modeling set of 180 drugs and external test set of 36 drugs. In turn, the modeling set was separated into five training and five test sets for LGO-CV. Several QSPkR models were generated on the five training set using different combinations of descriptors. The best models demonstrated good fit, robustness and predictivity: r2 0.635 – 0.700 (mean 0.666), q2

LOO-CV 0.578 – 0.664 (mean 0.604), MFEP 1.93 – 2.41 (mean 2.19) and accuracy at 2-fold error level 50 – 71% (mean 60%). The models derived for the five training sets within the modeling set were very similar in terms of selected variables, outliers and statistics.

The 27 most frequently emerging descriptors were used for development of the Consensus QSPkR model for logVss of basic drugs on the whole modeling set of 180 molecules, shown below:

62

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( ) 366.0SdssC*032.0071.0Dipole*011.0040.0acnt_SaaaC*031.0079.0

G*029.0155.0acnt_SsF*046.0254.0acnt_SsCl*065.0*333.0

f*090.0565.0ncirc*006.0026.0Plog*013.0124.0Vlog

min

Bss

−±−±−±+

+±+±+±+

+±+±+±=

where log P is a lipophilicity parameter, ncirc – the number of cycles, fB – the fraction ionized as a base at pH 7.4, SsCl_acnt, SsF_acnt and SaaaC_acnt – the number of all Cl-, F-, or aromatic C-atoms in fused rings in the molecule, respectively, Gmin – the minimum E-state atom in the molecule, Dipole – the dipole moment of the molecule, and SdssC – the sum of E-state values of all dssC – atoms in the molecule.

Consensus model for Vss of basic drugs is significant and robust with r2 = 0.663 and F = 35.43. It demonstrates very good performance in both modeling set (SEE = 0.298, q2

LOO-CV = 0.606 and q2LGO-CV = 0.538) and external test set (r2

pred = 0.593, MFEP = 2.25, GMFEP = 1.98 and accuracy 50% at 2-fold-error level and 88% at 3-fold error level). The plot of the predicted versus observed logVss values is shown in Figure 3.

Figure 3. Predicted vs. observed logVss values for the modeling and external test set of

basic drugs.

According to Consensus model, logP, fB, ncirc, SsCl_acnt, SsF_acnt, SaaaC_acnt and Gmin correlate positively with logVss. Lipophilicity descriptor logP appears to be the most significant determinant of Vss, responsible for about 50% of the explained variance. There are 20 drugs with negative logP value, and 17 of them have relatively small Vss < 0.7 L/kg. In the other extreme, 10 of the 14 most lipophilic drugs (with logP > 5) have large Vss > 2 L/kg. The positive effect of logP on Vss has long been known [7], and it is quite reasonable as lipophilicity is a prerequisite for many processes involved in drug distribution: trans-membrane permeation, binding to tissue components, accumulation in

63

adipose cells, etc. The value of fB reflects the basicity of the drug. It is suggested that the presence of a strong basic center enables the ion-pair interactions with charged acidic head groups of membrane phospholipids, the binding to phosphatidylserine in cell membranes in several tissues and ion-trapping in lysosomes [42]. Molecular descriptor ncirc corresponds to the total number of cycles in the molecular graph. Molecules with bridged rings have higher values of ncirc. At equal molecular weight, molecules with higher ncirc have lower volume and surface. Hence, ncirc reflects the compactness of the molecule. It is reasonable that more compact molecules have higher Vss as compactness is favorable for trans-membrane permeability. The presence of Cl- and F- atoms (encoded by SsCl_acnt and SsF_acnt) contributes to lipophilicity, and the presence of aromatic C-atoms in fused rings (descriptor SaaaC_acnt) – to the compactness of the molecule. Gmin represents the lowest E-state value in the molecule and corresponds to the most electrophile atom. This descriptor has lowest (negative) values for C-atoms, connected with large number of strong electronegative atoms (for example, -CF3, –C(SO2NH2), etc.). High positive values of Gmin are observed for terminal C-atoms, connected with N-atoms, which are relatively strong basic centers. The highest values for Gmin areobserved for amitriptyline (pKa 9.18), imipramine (pKa

9.49), desipramine (pKa 10.40), etc. Hence, positive value of Gmin implies high basicity, favorable for extensive distribution.

Two descriptors have negative effect on Vss of basic drugs: Dipole and SdssC. Dipole represents the dipole moment of the molecule and quantifies its polarity. It is a measure of unequal distribution of the electron density and has highest values for large molecules with M > 1000 g/mol and extended molecules with a strong electrophile located peripherally. Polarity should not be confused with hydrophilicity as there are many drugs with high Dipole and high logP value. The negative effect of polarity may be due to steric hindrances by crossing biological membranes, increased PPB or reduced ability for hydrophobic interactions with tissue constituents. Descriptor SdssC encodes the presence and E-state of a carbon atom type >C= (represented in the dataset as >C=O, >C=C< or >C=N-). It has negative values for molecules with many electronegative atoms, and positive – for molecules with prevalence of aromatic and aliphatic moieties. According to the model, a negative SdssC is favorable for Vss.

Twelve drugs are identified as outliers from the model – eight from the modeling set and four – from the external test set. The search in the literature showed that netilmicin and pyrimethamine are not real outliers. The Vss of netilmicin in Obach’s database is 0.073 L/kg. The value in the original reference is 0.2 – 0.3 L/kg [69], and

64

even higher value of 0.68 L/kg was reported [70]. These values are close to predicted by our model Vss of 0.42 L/kg. Similarly, the Vss of pyrimethamine in Obach’s dataset is 0.43 L/kg, while others reported Vss in the range 2.12 - 3.06 L/kg [71]. Our predicted value of 2.67 L/kg falls within this range.

The Vss values of four drugs are over-predicted by the model: cetrorelix (Vss 0.39 L/kg, predicted 19.2 L/kg), ziprasidone (Vss 1.0 L/kg, predicted 15.5 L/kg), procyclidine (Vss 0.74 L/kg, predicted 4.26 L/kg), and mibefradil (Vss 3.1 L/kg, predicted 16.5 L/kg). The high predicted value for Vss of cetrorelix and ziprasidone results from the prevailing positive features in their molecules. However, the extremely large molecular size of cetrorelix (M = 1431 g/mol) disfavors trans-membrane permeability and localizes the drug in plasma, where it is highly bound to plasma proteins (fu 0.14). Ziprasidone is extensively metabolized in liver [72] and almost completely bound to plasma proteins (fu 0.0012), which locates it mainly in the central compartment. The disagreement between observed and predicted values for procyclidine (logP 3.93) and mibefradil (logP 6.29) may be due to their high lipophilicity.

The value of Vss is under-predicted for six drugs: sildenafil (Vss 1.4 L/kg, predicted 0.2 L/kg), triamterene (Vss 13 L/kg, predicted 0.64 L/kg), azithromycin (Vss 33 L/kg, predicted 2.14 L/kg), pentamidine (Vss 53 L/kg, predicted 4.51 L/kg), topixantrone (Vss 57 L/kg, predicted 4.51 L/kg), and chloroquine (Vss 140 L/kg, predicted 9.5 L/kg). The lower predicted value of sildenafil results from the high negative value of Gmin and very low fB of 0.04. The last five outliers have extremely large Vss > 10 L/kg, implying considerable tissue accumulation. The low predicted Vss of triamterene is due to its low lipophilicity (logP 0.18). However the drug is extensively bound in tissues in the central compartment [73] and cleared by hepatic metabolism and biliary excretion [74]. The main factor for the extremely high Vss of azythromycine and pentamidine is the presence of a two strong basic centers in the molecule [42]. Extensive uptake and slow release from tissues have been suggested for the long drug half-life of azithromycine [75]. Pentamidine is a substrate of the organic cation transporters facilitating extensive distribution in kidneys, liver and bile [76]. Topixantrone also has two strong basic centers, and a prominent affinity of DNA to the drug has been suggested [77]. Chloroquine is the drug with the largest Vss in the dataset. It distributes widely in numerous tissues [78] and accumulates in skin and eye with a slow release from the pigmented tissues [79]. Ion trapping was suggested as the main reason for chloroquine tissue accumulation [80]. Obviously, unique distribution patterns of these high-Vss drugs, not captured by the model, are the main reason for their under-prediction.

65

References:

1. Touten PL, Bousquet-Melou A. Volumes of distribution. J Vet Pharmacol Ther 2004; 27:441-453.

2. Rowland M, Tozer TN, Multiple-dose regimens, in Rowland M; Tozer TN (eds), Clinical pharmacokinetics and pharmacodynamics: concepts and applications, 4th ed, Baltimore Maryland, Lippincott Williams & Wilkins, pp 293-329, 2010.

3. Grover A, Benet LZ. Effects on drug transporters on volume of distribution. AAPS J 2009; 11:250-261.

4. Shugarts S, Benet LZ. The role of transporters in the pharmacokinetics of orally administrated drugs. Pharm Res 2009; 26:2039-2054.

5. Obach RS, Baxter JG, Liston TE, Zilber BM, Jones BC, McIntyre F, Rance DJ, Wastall P. The prediction of human pharmacokinetiv parameters from preclinical and in vitro metabolism data, J Pharmacol Exp Ther 1997; 283(1):46-58.

6. Mager DE. Quantitative structure – pharmacokinetic/pharmacodynamics relationships. Adv Drug Deliv Rev 2006; 58:1326-1356.

7. Fagerholm U. Prediction of human pharmacokinetics – Evaluation of methods for prediction of volume of distribution. J Pharm Pharmacol 2007; 59:1181-1190.

8. Obach RS. Prediction of human volume of distribution using in vitro, in vivo and in silico approaches. Annu Rep Med Chem 2007; 42:469–488.

9. Sui X, Sun J, Wu X, Li H, Liu J, He Z. Predicting the volume of distribution of drugs in humans. Curr DrugMetab 2008; 9:574–580.

10. Berry LM, Li C, Zhao Z. Species differences in distribution and prediction of human V(ss) from preclinical data. Drug Metab Dispos 2011; 39(11):2103-2116.

11. Zou P, Zheng N, Yang Y, Yu LX, Sub D. Prediction of volume of distribution at steady state in humans: comparison of different approaches. Expert Opin Drug Metab Toxicol 2012; 8(7):855-872.

12. Di L, Feng B, Goosen TC, Lai Y, Steyn S, Varma MV, Obach RS. A perspective on the prediction of drug pharmacokinetics and disposition in drug research and development. Drug metab Dispos 2013; 41:1975-1993.

13. Lombardo F, Waters NJ, Argitar UA, Dennehy MK, Zhan J, Gunduz M, Harriman SP, Berellini G, Rajlic IL, Obach RS. Comprehensive assessment of human pharmacokinetic prediction based on in vivo animal pharmacokinetic data, part1: volume of distribution at steady state. J Clin Pharmacol 2013; 53(2):167-177.

14. Boxenbaum H. Interspecies scaling, allometry, physiological time, and the ground plan of pharmacokinetics. J Pharmacokinet Biopharm 1982; 10:201-227.

66

15. Boxenbaum H. Interspecies pharmacokinetic scaling and the evolutionary-comparative paradigm. Drug Metab rev 1984; 15:1071-1121.

16. Caldwell GW, Masucci JA, Yan Z, Hagemar W, Allometric scaling of pharmacokinetic parameters in drug design: can human CL, Vss and t1/2 be predicted from in vivo rat data? Eur J Drug Metab Pharmacokinet 2004; 29:133-143.

17. Ward KW, Smith BR. A comprehensive quantitative and qualitative evaluation of extrapolation of intravenous pharmacokinetic parameters from rat, dog and monkey to human. II. Volume of distribution and MRT. Drug metab Dispos 2004; 32:612-619.

18. Sewada Y, Hanano M, Sigiyama Y, Iga T. Prediction of the disposition of nine weakly acidic and six weakly basic drugs in human from pharmacokinetic parameters in rats. J Pharmacokinet Biopharm 1985; 13:477-492.

19. Bjorkman S. Prediction of drug disposition in infants and children by means of physiologically based pharmacokinetic modeling: theophylline and misazolam as model drugs. Br J Clin Pharmacol 2004; 59:691-704.

20. Poulin P, Theil F-P. Prediction of phatmacokinetic prior to in vivo studies. Mechanism based prediction of the volume of distribution. J Pharm Sci 2002; 91:129-156.

21. Rodgers T, Leahy D, Rowland M. Physiologically based pharmacokinetic modeling 1: Predicting the tissue distribution of moderate-to-strong bases. J Pharm Sci 2005; 94:1259-1276.

22. Rodgers T, Rowland M. Physiologically based pharmacokinetic modeling 2: Predicting the tissue distribution of acids, very weak bases, neutral and zwiterions. J Pharm Sci 2006; 95:1238-1257.

23. Del Amo EM, Ghemtio L, Xhaard H, Yliperttula M, Urtti A, Kidron H. Applying linear and non-linear methods for parallel prediction of volume of distribution and fraction of unbound drug. PLOS one 2013; 8(10):e74758. Doi:10.1371/journal.pone.0074758.

24. Turner JV, Maddalena DJ, Cutler DJ, Agatonovic-Kustrin S. Multiple pharmacokinetic parameter prediction for a series of cephalosporins. J Pharm Sci 2003; 92(3):552-559.

25. Karalis V, Tsantili-Kakoulidou A, Macheras P. Quantitative structure – pharmacokinetic relationships for disposition parameters of cephalosporins. Eur J Pharm Sci 2003; 20:115-123.

67

26. Ng C, Xiao Y, Putnam W, Lum B, Tropsha A. Quantitative structure – pharmacokinetic pparameter relationships (QSPKR) analysis of antimicrobial agents in humans using simulated annealing k-nearest-neighbor and partial least – square analysis methods. J Pharm Sci 2004; 93(10):2535-2544.

27. Wajima W, Fukumura K, Yano Y, Oguma T. Prediction of human phatmacokinetics from animal data and molecular structural parameters using multivariate analysis: volume of distribution at steady state. J Pharm Pharmacol 2003; 55:939-949.

28. Turner JV, Maddalena DJ, Cutler DJ. Pharmacokinetic parameter prediction from drug structure using artificial neural networks. Int J Pharmac 2004; 270:209-219.

29. Lombardo F, Obach RS, Shalaeva MY, Gao F. Prediction of volume of distribution values for neutral and basic drugs using physicochemical measurements and plasma protein binding data. J Med Chem 2002; 45(13):2867-2876.

30. Oie S, Tozer TN. Effect of altered plasma protein binding on apparent volume of diatribution. J Pharm Sci 1979; 68:1203-1205.

31. Lombardo F, Obach RS, Shalaeva MY, Gao F. Prediction of volume of distribution values for neutral and basic drugs. 2Extended data set and leave-class-out statistics. J Med Chem 2004; 47(5):1242-1250.

32. Lombardo F, Obach RS, DiCapua FM, Bakken GA, Lu J, Potter DM, Gao F, Miller MD, Zhang Y. Hibrid mixture discriminant analysis – random forest computational model for the prediction of volume of distribution of drugs in human. J med Chem 2006; 49(7):2262-2267.

33. Gleeson MP, Waters NJ, Paine SW, Davis AM. In silico human and rat Vss structure – activity relationships models. J Med Chem 2006; 49(6):1953-1963.

34. Ghafourian T, Barzetar-Jalali M, Hakimiha N, Cronin MTD. Quantitative structure-pharmacokinetic relationship modeling: apparent volume of distribution. J Pharm Pharmacol 2004; 56:339-350.

35. Ghafourian T, Barzetar-Jalali M, Dastmalchi S, Khavari-Khorasani T, Hakimiha N, Nokhodchi A. QSPR model for the prediction of apparent volume of distribution. Int J Pharmac 2006; 319:82-97.

36. Berellini G, Springer C, Waters NJ, Lombardo F. In silico prediction of distribution on human using linear and nonlinear models on 669 compounds data set. J Mol Chem 2009; 52:4488-4495.

68

37. Obach RS, Lombardo F, Waters NJ. Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds. Drug Metab Dispos 2008; 36:1385-1405.

38. Demir-Kavuk O, Benzien J, Muegge I, Knapp EW. DemQSAR: predictive human volume of distribution and clearance of drugs. J Comp Aid Mol Des 2011; 25:1121-1133.

39. Freitas AA, Limbu K, Ghafourian T. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients. J Cheminform 2015; 7:6.

40. Clark DE. Rapid calculation of polar molecular surface area and its application for prediction of transport phenomena. 1. Prediction of intestinal absorption. J Pharm Sci 1999; 88:807-814.

41. Smith DA, Jones BC, Walker PK. Design of drugs involving the concepts and theories of drug metabolism and pharmacokinetics. Mol Res Rev 1996; 16:243-266.

42. Smith DA, Allerton C, Calgutkar AS, Waterbeemd H, Walker DK. Distribution, in Smith DA; Allerton C; Calgutkar AS; Waterbeemd H; Walker DK (eds), Pharmacokinetics and metabolism in drug design, 3rd ed, Weinheim, Wiley – VCH Verlag GMBH&Co. KGaA 2012, pp. 61-79.

43. Zhivkova Z, Doytchinova I. Prediction of steady-state volume of distribution of acidic drugs by quantitative structure – pharmacokinetics relationships. J Pharm Sci 2012; 101(3):1253-`1266.

44. Zhivkova Z, Doytchinova I. Quantitative structure – pharmacokinetics relationships analysis of basic drugs: volume of distribution. J Pharm Pharm Sci 2015; 18(3):515-527.

45. http://www.drugbank.ca 46. http://chemicalbook.com 47. http://kikkajiweb.jst.go.jp 48. http://www.ebi.ac.uk 49. Hall LH, Kier LB. Electrotopological state. Structure modeling for QSAR and

database analysis, in Devillers J; Balaban A (eds), Topological indices and related descriptors in QSAR and QSOR, London, Gordon and Breach, 1999, pp. 491-562.

50. Hall LH, Kier LB. Molecular connectivity chi indices for database analysis and structure – property modeling, in Devillers J; Balaban A (eds), Topological indices and related descriptors in QSAR and QSOR, London, Gordon and Breach, 1999, pp. 307-360.

69

51. Leardi R. Genetic algorithm in chemistry. J Chromatogr A 2007; 1158:226-233. 52. Tropsha A. Best practices for QSAR model development, validation, and

exploitation. Mol Inf 2011; 29:476-488. 53. Yang X, Gandhi YA, Duignan DB, Morris ME. Prediction of biliary excretion in

rats and humans using molecular weight and quantitative structure – pharmacokinetic relationships. AAPS J 2009; 11:511-525.

54. Van Beek E, Lowik C, Van der Pluijm G, Papapoulos S. The role of geranylgenarylation in bone resorption and its suppression by bisphosphonates in fetal bone explants in vitro: A clue to the mechanism of action of nitrogen containing bisphosphonates. J Bone Miner Res 1999; 14:722-729.

55. Keating MJ, Talpaz KM, Redman JM, Koller C, Barlogie B, Velasquez W, Plunkett W, Freireich EJ, McCredie KB. Fludarabine: A new agent with major activity against chronic lymphocyte leukemia. Blood 1989; 74:19-25.

56. Rodenstain D, DeCoster A, Gazzaniga A. Pharmacokinetics of oral acetylcysteine: absorption, bindung and metabolism in patients with respiratory disorders. Clin Pharmacokinet 1978; 3(3):247-254.

57. Kragh-Hansen U, Riisom T. Complexes of aliphatic sulphates and human serum albumin studied by 13C nuclear-magnetic-resonance spectroscopy. Eur J Biochem 1976; 70:15-23.

58. Kier LB, Hall LH. Intermolecular accessibility: the meaning of molecular connectivity. J Chem Inf Comput Sci 2000;40:792-795.

59. Newton PN, Barnes KI, Smith PJ, Evans AC, Chierakul W, Ruangveerayuth R, White NJ. The pharmacokinetics of intravenous artesunate in adults with severe falciparum malaria. Eur J Clin Pharmacol 2006; 62:1003-1009.

60. Li Q, Xie LH, Haeberle A, Zhang J, Weina P. The evaluation of radiolabeled artesunate on tissue distribution in rats and protein binding in humans. Am J Trop Med Hyg 2006; 75:817-826.

61. Dockens RC, Santone KS, Mitroka JG, Morrison RA, Jemal M, Green DS, Barbhaiya RH. Disposition of radiolabeled ifetroban in rats, dogs, monkeys and humans. Drug Metab Dispos 2000; 28:973-980.

62. Nezasa K, Higaki K, Yukawa T, Inazawa K, Hasegawa H, Nakano M. Liver specific distribution of rosuvastatin in rats: Comparison with pravastatin and simvastatin. Drug Metab Dispos 2002; 30:1158-1163.

63. Bergman E, Forsel P, Tevell A, Persson EM, Hedeland M, Bondesson U, Knutson I, Lennernas H. Biliary secretion of rosuvastatin and bile acids in humans during the absorption phase. Eur J Pharm Sci 2006; 29:205-214.

70

64. Davies NM, Skjodt NM. Chosing the right non-steroidal anti-inflammatory drug for the right patient: A pharmacokinetic approach. Clin Pharmacokinet 2000; 38:377-392.

65. Baber N, Halliday L, Sibeon R, Littler T, Orme ML. The interaction between indomethacin and probenecid. A clinical and pharmacokinetic study. Clin Pharmacol Ther 1978; 24:298-307.

66. Jaber LA, Antal EJ, Slaughter RL, Welshman IR. The pharmacokinetics and pharmacodynamics of 12 weeks og glyburide therapy in obese diabetics. Eur J Clin Pharmacol 1993; 45:459-463.

67. Weiss HM, Pfaar U, Schweitzer A, Wiegand H, Skerjanec A, Schran H. Biodistribution and plasma protein binding of zoledronic acid. Drug Metab Dispos 2008; 36:2043-2049.

68. Li EC, Davis LE. Zoledronic acid: A new parenteral bisphosphonate. Clin Ther 2003; 25:2669-2708.

69. Welling PG, Baumueller A, Lau CC, Madsen PO. Netilmicin pharmacokinetics after single intravenous doses to elderly male patients. Antimicr Agents Chemother 1977; 12:328-334.

70. Wenk M, Spring P, Vozeh S, Follath F. Multicompartment pharmacokinetics of netilmicin. Eur J Clin Pharmacol 1979; 16(5):331-334.

71. Corvaisier S, Charpiat B, Mounier B, Wallon M, Leboucher G, Kurdi MA, Chailet J-F, Payron F. Population pharmacokinetics of pyrimethamine and sulfadoxine in children treated for congenital toxoplasmosis. Antimicrob Agents Chemother 2004; 48(10):3794-3800.

72. Beedham C, Miceli JJ, Obach RS. Ziprasidone metabolism, aldehyde oxidase, and clinical implications. J Clin Pdychopharmacol 2003; 23(3):229-232.

73. Kau ST, Sastry BD. Distribution and pharmacokinetics of triamterene in rats. J Pharm Sci 1977; 66(1):53-56.

74. Mutschler E, Gilfrich HJ, Knauf H, Moerke W, Voelker KD. Pharmacokinetics of triamterene. Clin Exp Hypertens A 1983; 5(2):249-269.

75. Luke DR, Foulds G, Cohen SF, Levy B. Safety, toleration, and pharmacokinetics of intravenous azithromycin. Antimicr Agents Chemother 1996; 40:2577-2581.

76. Ming X, Ju W. Wu H, Tidwell RR, Hall JE, Thakker DR. Transport of dicationic drugs pentamidine and furamidine by hOCTs. Drug Metab Dispos 2008; 37(2):424-430.

71

77. Sissi C, Moro S, Richter S, Gatto B, Menta E, Spinelli S, Kra[cho AP, Zunino F, Palumbo M. DNA-interactive aza-antrapyrazoles: biophysical and biochemical studies relevant to the mechanism of action. Mol Pharmacol 2001; 59(1):96-103.

78. Adelusi SA. Tissue and blood concentration of chloroquine following chronic administration in the rat. J Pharm Pharmacol 1982; 11:733-735.

79. Sams WM, Epstein JH. The affinity of melanin for chloroquine. J Invest Dermatol 1965; 45:482-488.

80. Daniel WA, Bickel MH, Honegger UE. The contribution of lysosomal trapping in the uptake of desipramine and chloroquine by different tissues. Pharmacol Toxicol 1995; 77(6):402-406.

72

Chapter 3

QSPkR modeling of drug clearance

Clearance

CL is a key PK parameter quantifying the ability of the living organism to eliminate a drug. Both blood and plasma clearance can be defined, and their values are the same in the case of equal drug concentration in blood and plasma. As plasma is the most appropriate matrix for drug concentration assay, plasma clearance is most frequently reported. It is also referred to as body clearance, total clearance or systemic clearance.

Plasma CL represents a proportionality coefficient between the rate of elimination (velimin) and drug concentration in plasma (C):

C.CLvelimin =

In the case of linear elimination, CL is constant and corresponds to the volume of plasma completely cleared of the drug per unit time. By non-linear elimination, CL depends on C and the dose (D) and increases with their decline.

CL is decisive for the whole PK behavior. It controls bioavailability (F) of orally administered drugs, and, together with the volume of distribution (Vd), determines drug half-life (t1/2):

CLV.2ln

t d2/1 =

CL largely controls the drug exposure in the body according to the equation:

AUCD.F

CL =

where AUC is the area under the C/t curve.

In multiple drug administration CL is a major determinant of the dosage rate required to maintain the average steady-state plasma concentration (Css

av):

FC.CLD ss

av=τ

where the dosage rate is expressed as a ratio between the dose D and dosing interval τ.

73

CL can be defined from physiological point of view as a loss of drug while passing through any clearing organ [1]. Based on a simple model and matherial balance of the drug, CL is related to the blood flow (Q) and the extraction ratio (E):

E.QCL =

E reflects the capacity of the organ to clear the drug and varies between 0 and 1. In terms of E values, drugs can be classified as low (E < 0.3), medium (0.3 < E < 0.7) and high (E > 0.7) extraction ratio drugs. The value of CL varies between 0 and Q, i. e. the upper limit of drug clearance in a particular organ equals the blood flow through the same organ.

Total CL is an additive variable, sum of the clearances by all clearing organs (renal, hepatic, biliary, pulmonary, etc.), usually expressed as a sum of renal (CLR) and non-renal (CLNR) clearances:

NRR CLCLCL +=

Generally CLNR is equated to the hepatic CL (CLH) as the clearances in other organs in most cases are negligible.

Considering the body as a whole system, Q corresponds to the cardiac output, and E – to the whole body extraction ratio. For typical human with body weight of 70 kg, the cardiac output is about 80 mL/min/kg, and this is the upper limit of drug total clearance [2]. Taking into account, that the renal blood flow (QR) is 15.7 mL/min/kg, and hepatic blood flow (QH) is 21 mL/min/kg [3], the upper limit of the CL of drug subjected to renal and hepatic elimination is about 40 mL/min/kg.

The major drug clearance pathways

Although multiple organs may be involved in the elimination process, the liver and kidney contribute to the majority of the total CL. Approximately 70% of the top 200 prescribed drugs in the United States in 2002 were shown to be cleared by metabolism, 25% – by renal excretion, and 5% – by biliary excretion [4].

Drug elimination from the liver consists of several processes: hepatic uptake, metabolism and/or biliary excretion. Metabolism is the biochemical transformation of a drug or xenobiotic, traditionally consisting of Phase I and Phase II processes. Phase I metabolism leads to involvement of more reactive functional groups in the molecule by

74

oxidation, reduction, or hydrolysis. Phase II metabolism comprises conjugation reactions, where a highly hydrophilic endogenic moiety is attached to make the molecule more hydrophilic and to facilitate its excretion in urine or bile [5]. The majority of Phase I metabolic reactions are catalyzed by enzymes from cytochrome P450 (CYP) superfamily. Of the 57 known human isoforms of CYPs, just six seem to play a significant role in drug metabolism. In particular, CYP1A2, CYP2C9, CYP2C19, CYP2D6, CYP2E1 and CYP3A4 account for the oxidative metabolism of more than 90% of marketed drugs [6]. The key Phase II enzymes include uridine diphosphate-dependent glucuronosyl transferase (UGT), sulfotransferases and glutathione-S-transferase. Glucuronidation of small lipophilic molecules by UGTs is the most important Phase II process [7]. CYP and UGT isoforms are responsible for the elimination of more than 90% of drugs cleared by the liver [8].

Various carrier systems on the sinusoidal face of hepatocytes facilitate drug uptake in the liver (uptake transporters). They provide access to the interior of the hepatocyte for drugs with diverse physicochemical properties, ranging from ionized molecules with low trans-membrane permeability to fairly lipophilic molecules. Also, different families of transporters are located on the canalicular face, aimed to transport drugs or their metabolites into bile (efflux transporters). For highly permeable drugs (small, lipophilic, non-ionized molecules) neither uptake, nor efflux are rate limiting – these drugs are almost exclusively cleared by metabolism. For higher molecular weight compounds (M > 500 g/mol), which traditionally have high polar surface area (PSA) and moderate to high logP, hepatic uptake followed by biliary secretion can become the major clearance process [9].

Obviously, hepatic CL is a result of three factors: hepatic uptake, metabolic clearance, and secretion in bile. Each of them can be rate-determining. Hepatic uptake could be a rate-determining step in hepatic clearance not only for the compounds that are metabolically stable such as rosuvastatin [10] and pravastatin [11 – 13], but also for compounds that are extensively metabolized, e.g. repaglinide [14, 15] and atorvastatin [16]. The role of transporters in drug hepatic CL, including the interplay between enzymes and transporters is discussed in several reviews [17 – 22].

Renal excretion is a net result of glomerular filtration, tubular secretion and tubular reabsorption [23]. Glomerular filtration is a passive process by which about 10% of total renal blood flow are filtered through glomerulus. It is restricted by molecular size and plasma protein binding (PPB). Tubular secretion is an active process, mediated

75

by the coordinated action of uptake and efflux transporters, located at the basolateral and apical membranes of proximal tubular cells. Tubular reabsorption is the process of return of lipophilic, non-ionized molecules from urine into blood circulation, mainly by passive diffusion. While glomerular filtration and tubular secretion facilitate renal excretion, tubular reabsorption disfavors it. Depending on the dominating mechanism, a drug can be classified as: 1. drug, excreted solely by glomerular filtration (CLR ≈ fu.GFR), 2. Drug with net renal secretion (CLR > fu.GFR), and 3. drug with net reabsorption (CLR< fu.GFR), where fu is the free fraction of drug in plasma, and GFR – glomerular filtration rate [23, 24]. Obviously, CLR of drugs excreted solely by glomerular filtration is low, limited by GFR (~ 120 ml/min). Involvement of various specific and multi-specific transporters and their role in renal drug clearance has been widely discussed [24 – 26].

The predominant route of drug clearance is largely governed by physicochemical properties of drug and PPB. The most important physicochemical properties are lipophilicity, measured by the partition coefficient (logP) or distribution coefficient (logD), and polar surface area (PSA). It is considered that hydrophilic drugs (with logD7.4 < 0) are eliminated by renal excretion. However, for drugs with logD7.4 > 0, CLR decreases with the increase of lipophilicity. In contrast, the increase in logD7.4 results in increased metabolic CL [27, 28]. This profile is consistent with the role of lipophilicity in the major drug handling processes in the body. Neutral drugs tend to show low CLR, while hydrophilic and ionized compounds with hydrogen bonding ability are likely to be subjected to tubular secretion as they are not prone to tubular reabsorption, but are preferably substrates of active transport systems. Hydrophilic, high-PSA compounds tend to be less membrane permeable and thus removed by drug transport into excretory fluids (urine, bile). Hydrophobic, low-PSA compounds require biotransformation to more hydrophilic, membrane-impermeable metabolites that can then be transported into excretory fluids. This is a fundamental theme of drug disposition, and has been summarized in the development of the Biopharmaceutical Drug Disposition Classification System (BDDCS) [29]. In general, with respect to their physicochemistry and permeability the drugs can be arbitrary divided into three classes [30]:

- Highly permeable compounds have low PSA/logP ratios and are cleared by metabolism.

- Low permeable drugs have high PSA/logP ratios and are cleared renally, frequently with the aid of transporters.

76

- Moderate to poorly permeable compounds with high PSA and moderate to high logP are usually subjected to multiple processes combining hepatobilliar transport and metabolism.

PPB may be restrictive factor for drug CL, as it is assumed, that only the free

fraction of drugs is available for elimination. Highly bound drugs cannot be filtered through glomerulus and have low CLR (unless they are substrates of active tubular secretion). Hence, the potential for high tubular reabsorption is not the only reason for the low CLR of lipophilic compounds, but such drug are also more likely to be extensively bound to plasma proteins [28, 31]. PPB is not restrictive a factor for renal excretion of drugs, substrates of active tubular secretion. For example, penicillins are eliminated almost completely unchanged in urine [32], although many of them are highly bound (up to 90%) with plasma proteins. Similarly, only the free drug is available for biotransformation in hepatocytes, and PPB controls hepatic clearance of drugs with low extraction ratio. According to the “well stirred model” [33 – 35]:

int,uuH

int,uuHHHH CL.fQ

CL.f.QE.QCL

+==

where CLu,int is the hepatic unbound intrinsic clearance. Intrinsic clearance CLint is a measure of the enzyme activity toward a drug, defined at very low substrate concentration as:

m

maxint K

VCL =

where Vmax and Km are the maximum rate and Michaelis constant of the enzyme system. For drugs with low CLint hepatic CL is low, restricted by PPB (CLH = fu.CLint). For drugs with high CLint, CLH is independent of PPB and approaches QH [1].

Methods for prediction of human clearance

The great clinical importance of drug CL inspired an extensive research targeted to development of methods for CL prediction of new drug candidates. A wide suite of approaches has been published in the literature, which could be classified into three groups: allometric scaling, physiologically based methods and in silico methods.

The earliest and still widely used technique for prediction of human PK parameters is allometric scaling based on in vivo pharmacokinetic data from preclinical species and extrapolation to humans [36 – 38]. The success of the allometric predictions

77

depends crucially on how much the clearance mechanism is conserved across the species used. In most cases there are inter-species differences in PPB, metabolic patterns, substrate specificities and expression levels of drug-metabolizing enzymes and membrane transporters, etc., which result in inadequate prediction of human CL [22, 31, 39, 40]. Many different approaches have been proposed over the years to improve the predictive performance of allometric methods [30]. A comprehensive analysis on the performance of 37 methods for CL prediction based on iv PK data from rat, dog, and monkey for approximately 400 compounds with and without consideration of simple physiological parameters was published recently [41]. The superior methods were based on extrapolation of monkey CL values to humans, and methods incorporating data for differences in PPB between rat and human. They allowed prediction of about 60% of the drugs within the two-fold error of experimental CL values. However, there is an agreement that allometry may be successfully used for CL prediction of drugs, which are cleared exclusively by glomerular filtration, or if CLint is so high that body CL is limited by organ blood flow, which is allometrically scalable [22].

Physiologically based (PB) methods overcome most of the problems of allometric scaling. They rely on integrated information including both drug-dependent and system-dependent (physiological) parameters which may vary considerably across species and individuals. PB models can be generated using in vivo/in vitro animal data and in vitro human data. The biological and mechanistic bases of PB models allow more adequate extrapolation from animal species to human. The complexity of PB models ranges from simple static models, using only few parameters for predictions of single PK parameters, to dynamic physiologically based pharmacokinetic (PBPK) models allowing prediction of plasma and tissue concentration-time curves. The methodology and application of PBPK modeling are discussed in numerous reviews and books [3, 42 – 45].

A good number of static PB methods for prediction of human metabolic clearance have been reported, usually based on data for in vitro intrinsic CL (CLint). Prediction of human PK parameters from in vitro experiments has obvious advantages over animal scaling being less time consuming, labor intensive and ethical sensitive. Different in

vitro systems such as human microsomes, hepatocytes, and recombinant human expression systems have been used for CLint estimation, and the advantages and limitations of each in vitro approach are well documented [46 – 51]. Once the human CLint has been estimated, CLH can be calculated using the “well-stirred model” equation [33 – 35]. Obach et al. proposed three groups of methods based on different experimental techniques for estimation of CLint [39]. The best model (with MFEP 1.67 –

78

1.95 and accuracy 64 – 88%) were slightly superior to the allometric scaling methods. However, the conclusions cannot be generalized as the models were generated on limited datasets of 7 to 14 drugs. A new approach for prediction of in vivo metabolic CL from in vitro metabolic data was proposed by Iwatsubo et al. with fairly good performance for some drugs, but up to 5-fold error for others [52]. Factors like metabolism by other organs, slow establishment of drug equilibrium between blood and hepatocytes, presence of active transport and interindividual variability were considered as the main causes of the observed differences. Several reports have been published focused on prediction of transporter-mediated clearance [15, 53 – 56]. The methodologies for predicting human CL for transporter substrates are an area of extensive development.

In the last two decades in silico approach has established itself as the most effective tool for prediction of human PK parameters at very early stages of drug discovery. It allows predictions to be made on the basis of computed, theoretical descriptors, thus reducing time and labor costs and enabling the choice of structures with acceptable PK behavior from large drug-candidates libraries.

Most of the studies are focused on the prediction of various aspects of drug metabolism, including in vivo hepatic CL [57, 58], in vitro CLint [59, 60], in vitro metabolic stability [61 – 64], identification of enzyme inhibitors and substrates [65 – 81], Cytochrome 450 (CYP) isoform specificity [82 – 84], etc. The vast majority of reports consider enzymes of the CYP superfamily. Several comprehensive reviews on in

silico models for CYP mediated metabolism were published recently [8, 81, 84, 85]. Fewer studies concern prediction of UGT metabolism [7, 8, 86, 87].

Both structure-based and ligand-based approaches have been applied for clearance prediction. Structure-based methods require information about the 3D structure of the studied enzyme, available from X-ray analysis, homology or pharmacophore modeling. These approaches include 3D molecular modeling of the complex between the ligand and CYPs [88 – 90], quantum mechanical (QM) methods [91 – 94], pharmacophore modeling [86, 90, 95 – 97], molecular docking [98 – 102]. These studies could provide knowledge about the architecture of the enzyme binding sites.

Ligand-based approach encompasses various descriptor-based methods: classification, structural similarity, quantitative structure activity relationships (QSAR), three-dimensional QSAR (3D-QSAR), quantum mechanical (QM) methods, etc. They rally on the assumption that activity of a compound is exclusively a consequence of its

79

chemical structure and physicochemical properties, i. e. drugs with similar structure should have similar activity. The performance of these methods depends crucially on the availability of high quality data for sufficient large number of compounds. The inconsistency of activity data may lead to poor predictivity of the model. There is an agreement that when constructing QSAR model for drug metabolism the metabolic parameters should be measured by the same standard procedure, or collected from the literature in the same laboratory so that the data on different compounds can be reliable compared with each other [94].

One of the most frequently applied ligand-based techniques is QSAR – a process whereby the chemical structure is quantitatively correlated with activity. Quantitative structure – pharmacokinetics relationships (QSPkR) represents an extension of QSAR for prediction of PK parameters, and this is the preferred technique for early prediction of human total CL.

QSPkR methods for prediction of human body CL – an overview

Of all primary PK parameters drug CL is the most difficult to predict due to the involvement of multiple specific and complex mechanisms in drug elimination. Most drugs are cleared via several pathways and their clearance is determined by the rate and the extent of numerous processes such as drug uptake in liver, kidney and bile, metabolism in various organs, glomerular filtration, active secretion, passive and active reabsorption in kidney – each one with different structural requirements. The problem is further complicated by the binding of drugs to plasma proteins which can have a restrictive effect on drug clearance for drugs with low extraction ratio. This may explain the relatively small number of successive QSPkR models for prediction of human total CL.

The earliest QSPkR models were built on small series of cephalosporines [103, 104]. Despite the good predictive power, the utility of these models was restricted to the studied series. A few studies concerned larger datasets of drugs belonging to different structural and therapeutic classes [105 – 110]. The models were based on different combinations of descriptors: physicochemical, constitutional, geometrical, topological, electrotopological, and structural fragments. A wide suite of statistical and machine learning methods was used for model development: multiple linear and non-linear regression analysis [103, 110], partial least squares (PLS) [103, 106, 107, 109, 111], artificial neural networks (ANN) [104, 105], k-nearest neighbor (kNN) [106 – 108],

80

general regression network (GRNN [107], support vector regression (SVR) [107, 110], random forest (RF) [109, 111], classification and regression trees (CART) [111].

Turner et al. developed an ANN based QSPkR model on a dataset of 58 structurally diverse drugs with r2

pred = 0.73 for an external test set of six drugs [105]. The model involved seven descriptors (connectivity indices, lipophilicity parameters, presence of C-, N- and O- atoms of given type), however their physical meaning and their effect on CL have not been discussed.

Ng et al. applied two approaches (kNN and PLS) to a set of 44 antibiotics from different structural and therapeutic classes [106]. The kNN model showed better performance with q2 = 0.77 for the training set (n = 38) and r2

pred = 0.94 for the external test set of six drugs. Although the interpretation of most of the descriptors in the model was difficult because of their interdependence within the framework of the model, molecular charges, hydrogen bonding atoms and lipophilicity were identified as the most significant predictors of CL. The PLS – based method was unsatisfactory in terms of statistics (q2 = 0.09 and r2

pred = 0.35), however it allowed CL prediction for 61% of the drugs within the 2-fold error of the actual values with geometric mean fold error of prediction (GMFEP) as low as 1.28.

Yap and al. employed four diverse techniques (GRNN, SVR, KNN and PLS) for modeling of plasma CL, using a large dataset of 503 compounds [107]. Highly predictive models with GMFEP in the range 1.78 (SVR) and 2.03 (PLS) and accuracy at 2-fold level error 60 – 74.3% were reported. Although most of the descriptors in the models were difficult to interpret, a few structural features important for drug CL were identified. Sphericity, lipophilicity and hydrogen bonding capability correlated positively with drug CL, while branching and size had a negative effect. Compounds with low and high flexibility were found to have higher CL than those with moderate flexibility.

Yu presented a kNN model built on a dataset of 462 drugs classified as acids, bases, zwiterions and quaternary ammonium/pyridinium ions [108]. The model was based on common physicochemical descriptors and showed a good predictability on an external test set, however it was applicable only to ionized molecules.

Berellini et al. reported a novel linear PLS model built on a large dataset of 754 diverse compounds using combination of physicochemical descriptors and structural fragments [109]. The model showed good predictivity with GMFEP of 2.1 and accuracy

81

59% (at 2-fold error level). It performed better for ionized compounds, especially anionic and zwiterionic, and worse – for neutral drugs. The model involved 63 physicochemical descriptors and structural fragments, revealing several molecular features essential for drug CL. Most of them characterized the ionization state and hydrophilicity/lipophilicity of the molecule. The most important negative contributors were the fraction of the drug ionized at pH 7, the fraction unionized at pH 4 (both identifying acid molecules), polar surface area and farmacophore – triplet, including one hydrophobic and two hydrogen bond (HB) acceptor points (presented in big and hydrophilic molecules). The main positive factors were the distribution coefficient logD10 and the fraction ionized at pH 10 (identifying basic molecules). The effect of the structural fragments was lower and it could be related to the biotransformation pathways. Sulfone, sulfoxide or sulphonamide moieties, α-hydroxy- carboxylates, Cl- substituents in phenyl rings, and a double bond between C- and N- atoms contributed negatively to CL, while the presence of phenol groups, benzylic moieties, ester groups, aliphatic 5-member rings had a negative effect.

Gombar et al. generated QSAR models for prediction of systemic CL using SVR and MLR [110]. The method showed moderate accuracy: 75% of the external test set compounds were predicted within 2.5-fold of the value observed, and 90% – within 5-fold error of the value observed. To demonstrate the practical applicability of the model modification of the structure of vildagliptine (CL = 12.9 ml/min/kg) was made by replacing –OH group with –OCH3, leading to a structure with lower predicted CL value of 5.4 mL/min/kg.

Lombardo et al. proposed a new approach to CL prediction considering the main elimination pathways using an exhaustive dataset of 1003 compounds [112]. At first, a classification model based on principal component analysis (PCA) was generated capable of recognizing the primary path of elimination with 95% accuracy. Thereafter, PLS was applied for development of two models on 469 compounds eliminated primarily by metabolism or renal excretion. Renal models demonstrated better performance over metabolic models with GMFEP of 1.6 and 2.4, respectively. The average value of GMFEP was 2.1 and the number of compounds predicted within the two- and three-fold of experimental value was 62% and 81%, respectively.

A few attempts were made for QSPkR modeling of the renal clearance (CLR). Manga et al. developed a classification model for discrimination of drugs with low and high renal excretion using data for the free fraction of the drug excreted in urine (fe) for

82

160 drugs [113]. A threshold of 0.38 was postulated: drugs with fe < 0.38 were deemed to be poorly renally excreted, i. e. preferably metabolically cleared. The most significant descriptors in the model were logD6.5, the number of hydrogen bond (HB) donors, ionization potential, total and electronic energy, counts of -OH and -COOH groups and total net charge. Low lipophilicity, high number of HB donors and presence of COOH contributed to higher value of CLR. Another classification model based on data for the percent of drug excreted unchained in urine set a cutoff of 20% for differentiation between low- and high-cleared drugs [114]. The percent of renal excretion was found to be inversely proportional to logP and directly proportional to PSA and HB related descriptors.

Paine et al. applied PLS and RF for prediction of CLR using a dataset of 349 drugs [111]. Both global and local models (on subsets according to the ionization state) were developed. The RF models showed a good predictive ability with r2

pred on an external test set between 0.43 (for neutral molecules) and 0.79 (for acids and zwiterions). The statistics of the PLS models was inferior with r2

pred in the range 0.28 (for all ionized drugs) – 0.60 (for neutrals). The best PLS and RF models were able to predict the CLR of ionized drugs with an average fold error of 3.3 and 4 respectively. Two groups of descriptors were identified as major contributors to CLR: related to lipophilicity (with negative impact on CLR) and related to the charge or polarizability of the molecule (with positive effect on ClR). Lipophilicity parameters were relevant for all charge types while the average negative charge and the moment of inertia around the first and second plane presented in the models for acids and zwiterions, and the molecular weight and solvent accessible non-polar surface area contributed substantially to CLR of bases. CART was applied for generation of a classification tree leading to a simple set of renal clearance rules. The rules were based on the lipophilicity and ionization state and enabled a correct prediction for 60% of an independent test set.

QSPkR modeling of human CL with consideration of the ionization type

The published so far QSPkR models for drug CL are built either on limited congeneric series, or on larger datasets involving molecules of all ionization types. Most of the models demonstrated satisfactory predictive performance, however some of them were not properly validated, and other were poor interpretable. Most of the descriptors in the models differentiated between drugs of different ionization type, and it is not surprising that some models showed different predictive power on acids, bases, neutral and zwiterions [111]. Therefore, development of separate QSPkR models according to the

83

ionization type of the drugs seems reasonable as it could reveal the most significant structural features governing CL of drugs of different classes.

It is well established that acidic and basic drugs follow different PK patterns. Acids have considerably lower Vss than bases. Their high PPB (mainly to albumin in plasma) and the ionization at physiological pH 7.4 hinder their extensive distribution in tissues. Bases bind primarily to alpha-1-acid glycoprotein, display high affinity to membrane phospholipids and can be accumulated by ion-trapping in lysosomes [115]. The CL of acids is also lower than those of bases. Analysis of a dataset of 754 compounds, published recently, showed that 78% of the anionic drugs and 80% of zwitterions had low CL (below 4 mL/min/kg), with high CL values for only 1 – 2%. In contrast, most of the basic drugs had moderate or high CL, and only 29% – low CL [109]. Acidic drugs seem to be more often subjected to renal or biliary excretion, while basic drugs are cleared primarily by metabolism. Different membrane transporters facilitate drug uptake into liver, kidney and bile – specific organic anion transporters for acids, and organic cation transporters – for bases [25]. Different enzymes are also involved in metabolism. Considering CytP450 oxidation, anionic drugs are preferred substrates of CYP2C9, while most of the basic drugs are metabolized either by CYP2C19 or CYP3A4 [116].

Therefore, development of separate QSPkR models according to the ionization type of the drugs seems reasonable as it could reveal the most significant structural features governing CL of drugs of different classes. Here we discuss briefly our personal results from QSPkR modeling of plasma CL of acidic and basic drugs [117, 118]. It is emphasized on the interpretation of the descriptors in the models which give insight to the major molecular features governing drug CL. Special attention is paid on the outliers from the models as they provide information about the applicability domain of the model.

QSPkR model for unbound plasma CL (CLu) of acidic drugs

The dataset used for QSPkR modeling was extracted from Obach’s database [119], the largest available source of data for the major PK parameters of 670 compounds after iv administration. A drug was classified as an acid if the fraction ionized as an acid (fA) exceeded 10% and was considerably higher than the fraction ionized as a base (fB) as previously described [117]. 131 drugs with available data for fu were identified as acids, and their unbound CL was calculated as follows:

84

uu f

CLCL =

The values of CLu span five orders of magnitude (between 0.13 and 2,100 mL/min/kg), therefore they were logarithmically transformed before analysis in order to achieve close to normal distribution.

The structures of the drugs were presented in mol-file format drawn from public databases [120, 121]. They were described by 178 descriptors computed using ACD/LogD version 9.08 and MDL QSAR version 2.2. Genetic algorithm (GA) and stepwise multiple linear regression (MLR) were applied for variable selection and model development. The goodness of fit was assessed by the explained variance (r2) according to the equation:

( )

( )∑

∑

=

=

−

−

−= n

1i

2mean,obs,ui,obs,u

n

1i

2i,calc,ui,obs,u

2

CLlogCLlog

CLlogCLlog1r

where CLu,obs,i and CLu,calc,i are the observed and calculated by the model values of CLu

for the ith drug, CLu,obs,mean is the average value of CLu, and n – the number of drugs in

the training set. Drugs which residuals did not follow normal distribution were

considered as outliers and were removed from the training set before building of the

final model.

The models were validated by internal leave-one-out cross-validation (LOO-CV) and leave-many-out cross-validation (LMO-CV). For LMO-CV the drugs in the dataset were divided to six subsets. To this end they were arranged in a descending order of CLu and every one of six drugs was allocated in different subset. Each subset was used once as a test set for the model derived on the rest five subsets.

Predictivity of the models was assessed by the cross-validated coefficients q2LOO-

CV and q2LMO-CV, calculated according to the equation:

( )

( )∑

∑

=

=

−

−

−= n

1i

2

trainmean,obs,ui,obs,u

n

1i

2

traini,pred,ui,obs,u2

CLlogCLlog

CLlogCLlog1q

85

where CLu,pred,i is the predicted by the model value of CLu for the ith drug. In LMO-CV the average value of all CV runs was given as q2

LMO-CV. Fold error of prediction (FEP), mean fold error of prediction (MFEP) and geometric mean fold error of prediction (GMFEP) have also been computed according to the equations:

i,pred,ui,obs,u CLlogCLlog10FEP

−=

n

10MFEP

n

1i

CLlogCLlog pred,uobs,u∑=

−

=

n

CLlogCLlogn

1ii,pred,ui,obs,u

10GMFEP

∑

==

−

Accuracy was assessed by the percent of drugs predicted within the 2-fold or 3-fold error of the experimental values.

A number of significant QSPkR models were developed using different combination of descriptors. The best model in terms of statistics is given below:

( ) ( ) ( )( ) ( ) ( )( ) ( ) 032.08xvch*285.7430.41acnt_SssssC*085.0399.0

acnt_SddssS*082.0632.0StN*019.0065.0SHother*008.0028.0

H*012.0446.0acnt_SaasC*024.0125.0Dlog*018.0135.0CLlog max4.7u

+±−±−

−±−±+±+

+±+±+±=

The model is robust and predictive as proved by the statistical metrics: r2 0.720, q2

LOO-CV 0.68, q2LMO-CV 0.73, MFEP 2.84, and GMFEP 2.21. It was able to predict the

values of CLu for 54% and 70% of the drugs within the 2- and 3-fold error of experimental values, respectively. Five drugs were identified as outliers. The plot of predicted values of logCLu vs. observed values is given in Figure 1.

86

Figure 1. LogCLu, pred vs. logCLu,obs for the dataset of 131 acidic drugs.

Descriptors in the model reveal several molecular features important for the unbound clearance of acidic drugs. The main positive factor for CLu is lipophilicity, expressed as logD7.4. This is in agreement with most of the published QSPkR models for drug CL [105 – 107, 109, 113] and with the common belief that lipophilic drug have high CL. In addition, a number of structural features with positive contribution on CLu were identified. SaasC_acnt represents the number of substituted aromatic C-atoms in the molecule. 2/3 of the high clearance drugs (CLu ≥ 100 mL/min/kg) contain at least 4 substituted aasC-atoms in the molecule, in contrast to only 3 extremely hydrophilic drugs (11%) of the low clearance group (CLu ≤ 3 mL/min/kg). SHother equals the sum of hydrogen atom electrotopological (E-state) indices for all non-polar H-atoms in the molecule and correlates highly (r2 0.88) with the sum of C-atoms connected with non-polar H-atom (atom types aaCH, sCH3, ssCH2, sdCH, and dCH2). Most probably, this descriptor contributes to hydrophobicity. Hmax is equal to the E-state index of the most polar H-atom and reflects the ionizability of the molecule. It is well known that ionization of the molecules facilitates renal excretion by suppressing tubular reabsorption [28]. StN represents the sum of E-state indices of atoms of the type N≡ and encodes the presence of a CN-group in the molecule. Nitriles often play a key role as HB acceptors [122], and a high HB ability is considered as favorable for CLR [24]. Descriptor xvch8 (the 8th order connectivity valence change index) encodes the presence and substituents of 8-member ring system and affects negatively CLu. It presents in the molecules of cephalosporine antibiotics – fairly hydrophilic drugs with low CL in the range 1- 4 mL/min/kg. The presence of sulphonyl groups (encoded by descriptor

87

SddssS) and quaternary C-atoms (descriptor SssssC) also disfavor CLu. Sulfhydryl groups are highly polar and contribute to highly negative values of logD7.4, and the negative impact of quaternary C-atoms may be due to steric hindrances to the uptake in clearing organs.

Analysis of the outliers from the model can give additional insight for the structural determinants of CLu. Amidotrizoate (CLu,obs = 1.70 mL/min/kg, predicted 50 mL/min/kg) and its isomer iotalamic acid (CLu,obs = 2.45 mL/min/kg, predicted 38 mL/min/kg) are the only members in the database containing three I-atoms, attached to aromatic ring. Most probably, the presence of this highly polarizable I-atoms (not captured by the model) contributes negatively to CLu. Phenobarbital (CLu,obs = 0.137 mL/min/kg, predicted 12.5 mL/min/kg) is the drug with the lowest CLu in the dataset. This low CL value is very strange as the drug is sufficiently lipophilic (logD7.4 = 1.47) and the only negative feature according to the model is the presence of a quaternary C-atom. The drug is moderately bound to plasma proteins and is eliminated through numerous metabolic pathways and renal excretion [123]. The discrepancy between the observed and predicted CLu value of phenobarbital may be due to specific structural features in the molecule unforeseen by the model. The CLu values of valsartan (CLu,obs = 12.25 mL/min/k, predicted 254 mL/min/kg) and warfarin (CLu,obs = 3.67 mL/min/kg, predicted 141 mL/min/kg) are also over-estimated. Both drugs are highly bound with plasma proteins (fu < 0.05). Valsartan is eliminated primarily unchanged in bile with the aid of various organic anion transporters [124]. Warfarin is metabolized via numerous diverse metabolic pathways [125]. The complex elimination pathways with different structural requirements might be the reason for the large differences between the observed and predicted CLu values. Sulbactam (CLu,obs 8.233 mL/min/kg, predicted 0.47 mL/min/kg) is highly under-estimated by the model mainly due to its extremely low logD7.4 = - 5.11. Compared to the structurally similar penicillins, sulbactam is a much smaller molecule and does not meet the positive structural criteria set by the model. In addition, the elimination of sulbactam is dominated by tubular secretion [126], which may require additional structural features.

Separate QSPkR models were derived for drugs, eliminated primarily through renal excretion (n = 47), metabolism (n = 50) and biliary excretion (n = 16). Information about the major route of elimination was derived from literature. A drug was considered as being eliminated via given route if more than 60% of the dose was eliminated by this route. The QSPkR models and their statistics are given in Table 1:

88

Table 1. QSPkR models for logCLu of drugs with respect of the primary clearance path.

Primary clearance

path

QSPkR model Statistics r2 q2

LOO-

CV

MFEP Accuracy %

Renal excretion

( ) ( )( ) 02.1acnt_SaaO*16.077.0

SdsN*01.009.0Dlog*02.008.0CLlog 4.7u

+±+

+±−±=

0.62 0.51 1.69 70

Metabolism ( ) ( )( ) ( ) 26.04xc*02.060.1acnt_SddssS*21.081.0

SdsCH*05.016.0acnt_SaasC*03.024.0CLlog u

+±−±−

−±+±=

0.67 0.60 3.21 42

Biliary excretion

( ) ( )( ) 81.1acnt_SddssS*11.0.039.0

acnt_SdssC*03.019.0acnt_SaasC*03.020.0CLlog u

+±−

−±−±= 0.89 0.82 1.81 63

The drugs cleared primarily by renal excretion are highly hydrophilic compounds with logD7.4 (except milrinone with logD7.4 = 0.02), moderate molecular weight (M = 392±177 g/mol) and low-to-moderate PPB (fu = 0.54±0.3). They have relatively low values of plasma CL (0.0057 – 9.2 mL/min/kg). For most of them CL exceeds glomerular filtration rate (GFR) which implies considerable contribution of tubular secretion. Tubular reabsorption is negligible as most of the drugs are strong acids completely ionized at pH 7.4 and the negative logD7.4

value prevents tubular reabsorption [31]. The low PPB renders also relatively low CLu values (1.25 – 177 mL/min/kg), with only two drugs with CLu exceeding 100 mL/min/kg. According to the model, logD7.4 affects positively CLu. Indeed, it was suggested that drugs with logD7.4 between -2 and 0 have maximum CL values as they are sufficiently lipophilic to interact with transport proteins, but not to be subjected to tubular reabsorption [28]. CLu is further favored by the presence of aromatic O-atoms (descriptor SaaO_acnt) and disfavored by the presence of N-atoms of the type –N= (descriptor SdsN).

The drugs cleared primarily by metabolism are predominantly lipophilic (logD7.4 = 0.04±2.19), moderate size (M = 392±177 g/mol) and high PPB (fu = 0.11±0.19). They have relatively high CL values (0.03 – 26 mL/min/kg), which, together with the high PPB, results in high CLu values (0.13 – 2,042 mL/min/kg). CLu exceeds 100 mL/min/kg for 49% of the drugs. Descriptors SaasC_acnt (the number of substituted aromatic C-atoms in the molecule) and SdsCH (the sum of E-state indices of all =CH– atoms) correlated positively with CLu, most probably contributing to the lipophilicity of the molecule. The presence of sulphonyl groups (descriptor SddssS_acnt) and star-type structures (atoms, connected with four other atoms or groups, encoded by descriptor xc4) have negative impact on CLu.

89

The drugs cleared primarily by biliary secretion are characterized by highly variable lipophilicity (logD7.4 = -1.54±3.78, larger size (M = 519 ± 225 g/mol) and high PPB (fu = 0.11±0.18). They have relatively low values of plasma CL (0.15 – 9.7 mL/min/kg) and high CLu values (2.8 – 2,090 mL/min/kg). The main factor contributing positively to CLu of biliary excreted drugs was the presence of aromatic substituted C-atoms, encoded in SaasC_acnt – descriptor already emerged in the QSPkR model for the whole dataset and in the model for drugs primarily cleared by metabolism. Most probably, this descriptor accounts for hepatic uptake of the drug. Descriptors SdssC_acnt (number of C-atoms of the type =C<) and SaaNH (sun of the E-state indices of all aromatic NH-groups in the molecule) have negative contribution to CLu.

QSPkR models for plasma CL of basic drugs

A dataset of 263 basic drugs extracted from Obach’s database [119] was used in the study. A drug was identified as a base if the fraction ionized as base (fB) at physiological pH 7.4 exceeded 3% and was considerably higher than the fraction ionized as a base (fA) as previously described [118]. The drugs were classified according to their primary clearance pathway on the basis of literature search. Drugs were considered as renally cleared if at least 60% of iv dose were excreted unchanged in urine and no data for significant non-renal clearance were reported. The threshold for metabolically cleared drugs was 70%. 40 drugs were considered as eliminated primarily by renal excretion, 180 – by metabolism, 6 – by biliary excretion, and 20 followed mixed clearance mechanism with equal contribution of the other two or three classes. For 17 drugs no data regarding clearance pathways were found. The chemical structures were described by 154 molecular descriptors. GA and stepwise MLR were used for variable selection and model development.

Several QSPkR models for the CL of basic drugs were developed by consideration of the primary elimination pathway and plasma protein binding.

QSPkR model for the CL of drugs eliminated primarily by renal excretion.

The dataset consisted of 40 drugs with CL values in the range 0.55 – 13 mL/min/kg. The best QSPkR models built on a training set of 30 molecules is shown below:

( ) ( ) ( ) 626.0SssO*003.0026.0acnt_SsssN*063.0157.0SdsN*022.0082.0CLlog +±−±+±=

The model is highly significant and predictive with r2 = 0.890 and q2LOO-CV = 0.853. It is

able to predict the CL of the drugs in the external test set (n = 10) with r2pred = 0.733,

90

GMFEP = 1.57 and accuracy 80% (at 2-fold error level) and 100% (at 3-fold error level). According to the model, the presence of N-atoms of the types >N- and =N- (encoded by descriptors SdsN and SsssN_acnt) contribute positively to drug CL, while the presence of ether O-atoms (descriptor SssO) has a negative effect. Analysis of the dataset allowed defining a simple empiric rule for distinguishing between drugs with low, moderate and high CL (Table 2). It enabled classifying the drugs in the dataset with accuracy between 57 and 100%.

Table 2. Empiric rule for classification of drugs according to their renal CL.

Clearance, mL/min/kg Number of ether

O-atoms

Number of >N- or =N-

atoms

Predictive accuracy, %

Low < 1.7 > 2 0 100 Moderate 1.7 – 6.0 < 2 0 57

High > 6.0 < 2 > 1 86

QSPkR model for the CL of drugs eliminated primarily by metabolism

Statistically significant QSPkR models for the CL of drugs metabolically cleared drugs were derived only when PPB was taken into account. The negative logarithm of the free fraction of drug in plasma (pfu = -logfu) was used as a measure for PPB. It allows better understanding of the effect of PPB as high pfu values mean low fu, and high PPB, respectively. The dataset consisted of 152 molecules with available data for fu, with CL values in the range 0.14 – 290 mL/min/kg. The best QSPkR model, built on a training set of 122 molecules, is given below:

( ) ( ) ( )( ) ( ) ( )( ) ( ) 368.06xch*416.0801.0SssO*005.0026.0

SsCl*011.0051.0SssNH*020.0073.0Dlog*022.0074.0

H*055.0204.0H*217.0120.0pf*053.0273.0CLlog

4.7

maxminu

+±−±−

−±−±−±+

+±+±+−= m

Six drugs were identified as outliers and were removed before the development of the final model. The model is significant and predictive with r2 = 0.62 and q2

LOO-CV = 0.53. It isable to predict the CL of the drugs in the external test set (n = 30) with r2

pred = 0.36, GMFEP = 1.75 and accuracy 63% (at 2-fold error level) and 83% (at 3-fold error level).

91

According to the model, the CL of drugs cleared by metabolism is favored by logD7.4, Hmax and Hmin. The positive effect of the logD7.4 is in agreement with the generally accepted belief that lipophilic drugs are cleared primarily by metabolism [127]. Hmin and Hmax represent the minimum and maximum hydrogen E-state value in the molecule, respectively. Hmax encodes the most polar H-atom and has maximum values for molecules containing OH-groups. The presence of OH-groups is a prerequisite for Phase II metabolism. Glucuronic acid conjugation is the major metabolic pathway for many high CL drugs in the database: labetalol [128], morphine [129], naloxone [130], etc. Fenoterol [131] and R-apomorphine [132] are subjected to both glucuronidation and sulfation. All mentioned drugs contain at least two OH-groups, and have high CL values in the range 23 – 40 mL/min/kg. Hmin corresponds to the less polar H-atom in the molecule and has maximum values for molecules with aromatic rings. The presence of aromatic ring system is a premise for oxidative metabolism – one of the commonest Phase 1 metabolism reactions. The highest value of Hmin is observed for triamterene – a drug with extremely high CL = 63 mL/min/kg, which extensive oxidative metabolism to p-OH-triamterene [133].

According to the model, PPB (expressed as pfu) is restrictive factor for the CL of drugs subjected to metabolism. About 55% of the low CL drugs (with CL < 6 mL/min/kg, or 30% of the hepatic blood flow QH) have high PPB exceeding 90%. In contrast, 74% of the high CL drugs (CL > 13 mL/min/kg, or 70% of QH) have low-to-moderate PPB. Some structural features with negative effect on drug CL were identified: the presence of NH-groups ether O-atoms, Cl-atoms, and a six-member ring system (encoded in descriptors SHssNH, SssO, SsCl and xch6, respectively). Ether O-atoms and secondary amines are potential HB acceptors. HB ability contributes negatively to lipophilicity, and the negative contribution of HB acceptors to drug CL has been recently reported [109]. The presence of Cl-atoms increases metabolic stability by preventing aromatic hydroxylation and glucuronidation of phenols [134]. The crowding of a large number of 6-member rings seems to be unfavorable as molecules with more than 3 rings have low CL values (< 5.1 mL/min/kg). Most of them are extended or bulk molecules with high M which might be an obstacle for drug uptake in the clearing organs.

Six drugs are identified as outliers from the model. Oxycodone (CLobs = 6.1 mL/min/kg, predicted 19.77 mL/min/kg) appears not to be real outlier. CL values of 11.14 mL/min/kg [135] and 11.86 mL/min/kg [136] have been reported which are within the 2-fold error of our predictions. Esmolol (CLobs =290 mL/min/kg, predicted 9.07

92

mL/min/kg) and itraconazole (CLobs = 5.1 mL/min/kg, predicted 0.34 mL/min/kg) are highly under-estimated. Esmolol is the drug with the highest CL in the dataset. It undergoes rapid hydrolysis by red blood cell esterases [137]. Itraconazole is used as a mixture of four cis- and four trans-isomers which are subjected to stereoselective metabolism to different products [138]. These unique metabolic patterns may be the reason for the inconsistency with the model. Amlodipine (CLobs = 7.0 mL/min/kg, predicted 1.43 mL/min/kg) is under-estimated, as well. The extremely high PPB (fu = 0.005) may be a restrictive factor for drug uptake in liver. Quinine (CLobs = 1.9 mL/min/kg, predicted 17.09 mL/min/kg) and terodoline (CLobs = 1.1 mL/min/kg, predicted 6.2 mL/min/kg) are over-estimated. They are low CL drugs, and most probably their molecules contain any specific negative feature, not considered by the model.

QSPkR model for the CL of drugs, eliminated solely by CYP3A4 metabolism.

According to the available literature data, 27 drugs were considered as eliminated solely by CYP3A4 catalyzed metabolism. About 67% of them are low CL drugs (CL < 6 mL/min/kg), and 62% of them have high PPB (fu < 0.1PPB seems to be a restrictive factor for the clearance of CYP3A4 substrates. There are few drugs with rather high CL (> 13 mL/min/kg) despite of the extensive PPB, e. g. buprenorphine, ergotamine, saquinavir, propoxiphene. Most probably, their intrinsic CL is high enough to overcome the unfavorable effect of PPB.

Significant QSPkR model (r2 = 0.67) was built on the dataset of 27 drugs, metabolized solely by CYP3A4:

( ) ( )( ) ( ) 104.16xch*502.0077.2MaxQ*267.0008.1

acnt_SssO*029.0126.010xvch*85.962.44CLlog

p +±−±+

±−±=

According to the model, descriptors xvch10 and MaxQp correlate positively with logCL. Xvch10 represents the 10th order chain valence connectivity index and encodes information about the type and substituents in a 10-member ring system. It presents in 11 molecules as two fused hexagonal rings. The value of xvch10 is lower for aromatic rings, especially these containing O- and N-atoms, and higher for aliphatic rings. The presence of a large hydrophobic center seems to be favorable for the CL which is in agreement with the proposed pharmacophore for CYP3A4 substrates [65]. MaxQp signifies the maximum partial positive charge in the molecule. It is usually located at alkyl group, connected with a strong electronegative atom (N- or O-). These moieties

93

are susceptible to O- or N-dealkylation – one of the commonest CYP3A4 catalized reactions. All high CL drugs have high MaxQp values, and N-dealkylation was identified as the major metabolic pathway for buprenorphine (CL = 19 mL/min/kg) [139] and propopoxiphene (CL = 15 mL/min/kg) [140]. The negative effect of the presence of large number six-member rings (descriptors xch6) and ether O-atoms (descriptor SssO_acnt), which is potential HB acceptor, most probably due to steric hindrances for the hepatic uptake.

The model showed good performance for the training set: q2LOO-CV = 0.52,

GMFEP = 1.39, and 85% accuracy at 2-fold error level. The external predictivity was tested on 60 molecules, substrates of CYP3A4 and other enzymes. Eight drugs with FE > 4 were considered as outliers. After their removal, GMFEP = 1.92 and accuracy of 58% (at 2-fold error level) and 83% (at 3-fold error level) were observed. The plot of the predicted by the model vs. observed logCL is presented in Figure 2.

Figure 2. logCLpred vs. logCLobs for CYP3A4 substrates. The straight line corresponds to the ideal fit (logCLpred = logCLobs)

Most of the drugs out of 2-fold error limits are over-predicted which is not surprising as CYP3A4 metabolism represents only a part of the total CL. The four highly under predicted outliers (cocaine, gefitinib, telithromycin and trazodone) are cleared by multiple mechanisms with a minor contribution of CYP3A4 [141-144]. Most of the over-predicted drugs, including the four outliers (aripiprazole, azimilide, tamsulosin and trazodone), have very high PPB (fu < 0.06).

94

QSPkR model for the CLu of basic drugs

The dataset consisted of 220 basic drugs with available data for fu. QSPkR models were built on a training set of 176 molecules. The best one is given below:

( ) ( )( ) ( ) 131.1ncirc*004.0019.010xvch*86.1186.51

acnt_SaaCH*014.0059.0Dlog*017.0137.0CLlog 4.7u

+±−±+

+±+±=

The model shows satisfactory goodness of fit and predictivity as proved by the statistic metrics: r2 = 0.55 and q2

LOO-CV = 0.52 (for the training set) and r2pred = 0.54,

GMFEP 2.0, accuracy 58% and 79% at 2- and 3-fold level error, respectively (for the external test set of 44 molecules).

According to the model logD7.4, SaaCH_acnt and xvch10 correlate positively with logCLu. Lipophilicity is a prerequisite for many processes involved in the clearance process – uptake in the clearing organs, interaction with the transporter/enzyme binding sites, etc.). Majority of low-CLu drugs have logD7.4 < 0, and half of them are cleared by renal excretion. In contrast, most of the high-CLu drugs have logD7.4 > 2, and are subjected to extensive metabolism. SaaCH_acnt represents the number of non-substituted aromatic C-atoms which are potential sites of aromatic hydroxylation – one of the most common paths of oxidative metabolism. Most of the high-CLu drugs contain at least five aromatic CH-groups. Xvch10 (the 10th order connectivity valence index) encodes the presence, number and substituents in a 10-member ring system (in the current database represented by two fused hexagonal rings, or fused 7- and 5- member rings). The same descriptor was found to contribute positively to the CL of CYP3A4 substrates, which is in agreement with the requirement for a large hydrophobic center in the molecule [65]. Descriptor ncirc (encoding the number of circles in the molecule) has a negative contribution to CLu. Most of the molecules with large number of circles are huge molecules, which may have restricted uptake in the clearing organs.

Twelve drugs were identified as outliers from the model. Most of them have enormous high-CLu values due to extensive PPB (fu in the range 0.0012 – 0.02), and are highly under-predicted. In general, although CLu is frequently used as end-point variable for QSPkR modeling, it is calculated from two experimental variables which increases the risk of uncertainty of the modeled parameters.

95

References

1. Rowland M, Tozer TN. Elimination. In: Clinical Pharmacokinetics: concepts and application, 3rd Ed; Williams & Wilkins: Baltimore, Philadelphia, USA, 1995, pp. 156-184.

2. Toutain PL, Bousquet – Melou A. Plasma clearance, J Vet Pharmacol Ther 2004; 27:415-425.

3. Peters SA. Physiologically-Based Pharmacokinetic (PBPK) Modeling and Simulations: Principles, Methods, and Applications in the Pharmaceutical Industry; John Wiley & Sons, Inc: Hoboken, New Jersey, 2012, pp. 407-421.

4. Williams JA, Hyland R, Jones BC, Smith DA, Hurst S, Goosen TC, Peterkin V, Koup JR, Ball SE. Drug-drug interactions for UDP-glucuronyltransferase substrates: a pharmacokinetic explanation for typically observed low exposure(AUCi/AUC) ratios. Drug Metab Dispos 2004; 32(11):1201-1208.

5. Kumar, GN, Surapaneni, S. Role of drug metabolism in drug discovery and development, Med Res Rev 2001; 21(5):397-411.

6. Guengerich, FP. Human Cytochrome P-450 enzymes. In: Cytochrome P450: structure, mechanism, and biochemistry, 4th Ed, Ortiz de Montellano PR, Ed.; Springer Int Publ: Schwitzerland 2015, pp. 523-785.

7. Miners JO, Smith PA, Sorich MJ, McKinnon RA, Mackenzie PI. Predicting human drug glucuronidation parameters: application of in vitro and in silico modeling approaches, Annu Rev Pharmacol Toxicol 2004; 44:1-25.

8. Chohan KK, Paine SW, Waters NJ. Advancements in predictive in silico models for ADME, Cur Chem Biol 2008; 2:215-228.

9. Smith DA, Allerton C, Kalgutkar A, Van de Waterbeemd H, Walker DK. Clerance. In: Pharmacokinetics and metabolism in drug design, 3rd Ed; Wiley – VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2012, pp. 81-102.

10. Bergman E, Forsell P, Tevell A, Persson EM, Hedeland M, Bondesson U, Knutson L, Lennernas H. Biliary secretion of rosuvastatin and bile acids in humans during the absorption phase, Eur J Pharm Sci 2006; 29:205-214.

11. Nakai D, Nakagomi R, Furuta Y, Tokui T, Abe T, Ikeda T, Nishimura K. Human liver-specific organic anion transporter, LST-1, mediates uptake of pravastatin by human hepatocytes, J Pharmacol Exp Ther 2001; 297:861-867.

12. Watanabe T, Kusuhara H, Maeda K, Shitara Y, Sugiyama Y. Physiologically based pharmacokinetic modeling to predict transporter mediated clearance and distribution of pravastatin in man, J Pharmacol Exp Ther 2009; 328:652-662.

96

13. Warma MV, Lai Y, Feng B, Litchfield J, Goosen TC, Bergman A. Physiologically based modeling of pravastatin transporter-mediated hepatobiliary disposition and drug-drug interactions, Pharm Res 2012; 29:2860-2893.

14. Kalliokoski A, Backman JT, Kurkinen KJ, Neuvonen PJ, Niemi M. Effects of gemfibrozil and atorvastatin on the pharmacokinetic of repaglinide in relation to SLCO1B1 polymorphism, Clin Pharmacol Ther 2008; 84:488-496.

15. Warma MV, Lai Y, Kimoto E, Goosen TC, El-Kattan AF, Kumar V. Mechanistic modeling to predict the transporter- and enzyme-mediated drug-drug interactions of repagliniude, Pharm Res 2013; 30:1188-1199.

16. Maeda K, Ikeda Y, Fujita T, Yoshida K, Azuma Y, Haruyama U, Yamane N, Kumagai Y, Sugijama Y. Identification of the rate-determining process in the hepatic clearance of atorvastatin in a clinical cassette microdosing study, Clin Pharmacol Ther 2011; 90:575-581.

17. Ho RH, Kim RB. Transporters and drug therapy: Implications for drug disposition and disease, Clin Pharmacol Therap 2005; 78(3):260-277.

18. Shitara Y, Horie T, Sugiyama Y. Transporters as a determinant of drug clearance and tissue distribution, Eur J Pharm Sci 2006; 27:425-446.

19. Kusuhara H, Sugiyama Y. In vitro–in vivo extrapolation of transporter-mediated clearance in the liver and kidney, 2009; 24:37–52.

20. Shugarts S, Benet LZ. The role of transporters in the pharmacokinetics of orally administered drugs, Pharm Res 2009; 26:2039-2054.

21. Giacomini KM, Huang SM, Tweedie DJ, Benet LZ, Brouwer KL, Chu X, Dahlin A, Evers R, Fischer V, Hillgren KM, Hoffmaster KA, Ishikawa T, Keppler D, Kim RB, Lee CA, Niemi M, Polli JW, Sugiyama Y, Swaan PW, Ware JA, Wright SH, Yee SW, Zamek-Gliszczynski MJ, Zhang L. Membrane transporters in drug development, Nat Rev Drug Discov 2010; 9(3):215-236.

22. Di L, Feng B, Goosen TC, Lai Y, Steyn S, Varma MV, Obach RS. A perspective on the prediction of drug pharmacokinetics and disposition in drug research and development, Drug Metab Dispos 2013; 41:1975-1993.

23. Tucker GT. Measurement of the renal clearance of drugs, Br J Clin Pharmacol 1981; 12: 761-770.

24. Feng B, La Perle JL, Chang G, Varma MVS. Renal clearance in drug discovery and development: molecular descriptors, drug transporters and disease state, Exp Opin Drug Metab Tox 2010; 6(8):939-952.

25. Pelis RM, Wright SH. Renal transport of organic anions and cations, Compr Physiol 2011; 1:1795-1835.

97

26. Motohashi H, Inui K. Organic cation transporters OCTs (SLC22) and MATEs (SLC47) in the human kidney, AAPS J 2013; 15(2):581-588.

27. Hinderling PH, Schmidlin O, Seydel JK. Quantitative relationship between structure and pharmacokinetics of beta adrenoceptor blocking agents in man, J Pharmacokinet Biopharm 1984; 12:263-287.

28. Smith DA, Allerton C, Kalgutkar A, Van de Waterbeemd H, Walker DK. Renal clerance. In: Pharmacokinetics and metabolism in drug design, 3rd Ed; Wiley – VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2012, pp. 103-110.

29. Wu CY, Benet LZ. Predicting drug disposition via application of BCS: transport/ absorption/ elimination interplay and development of a biopharmaceutics drug disposition classification system, Pharm Res 2005; 22:11-23.

30. Smith DA, Allerton C, Kalgutkar A, Van de Waterbeemd H, Walker DK. Predicting human pharmacokinetics. In: Pharmacokinetics and metabolism in drug design, 3rd Ed; Wiley – VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2012, pp. 209-227.

31. Fagerholm U. Prediction of human pharmacokinetics – renal metabolic and excretion clearance, J Pharm Pharmacol 2007; 59:1463-1471.

32. Barza M, Weinstein L. Pharmacokinetics of the penicillins in man, Clin Pharmacokinet 1976; 1(4):297-308.

33. Gillette JR. Factors affecting drug metabolism, Ann NY Acad Sci 1971; 179:43-66.

34. Rowland M, Benet LZ, Graham GG. Clearance concepts in pharmacokinetics, J Pharmacokinet Biopharm 1973; 1:123-126.

35. Wilkinson GR, Shand DG. A physiological approach to hepatic drug clearance, Clin Pharmacol Ther 1975; 18:377-390.

36. Boxembaum H, DiLea C. First-time-in-human dose selection: allometric thoughts and perspectives, J Clin Pharmacol 1995; 35:957-966.

37. Mahmood I, Balian JD. Interspecies scaling: a comparative study for the prediction of clearance and volume using two or more than two species, Life Sci 1995; 59:579-585.

38. Mahmood I, Balian JD. Interspecies scaling: predicting clearance of drugs in humans. Three different approaches, Xenobiotica 1996; 26:887-895.

39. Obach RS, Baxter JG, Liston TE, Silber BM, Jones BC, McIntyre F, Rance DJ, Wastall P. The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data, J Pharmacol Exp Therap 1997; 283(1):46-58.

98

40. Fagerholm U. Prediction of human pharmacokinetics – evaluation of methods for prediction of hepatic metabolic clearance. J Pharm Pharmacol 2007; 59:803-828.

41. Lombardo F, Waters NJ, Argikar UA, Dennehy MK, Zhan J, Gunduz MMA, Harriman SP, Berellini G, Rajlic IL, Obach RS. Comprehensive assessment of human pharmacokinetic prediction based on in vivo animal pharmacokinetic data, Part 2: Clearance, J Clin Pharmacol 2013, 53(2): 178-191.

42. Espie P, Tytgat D, Sargentini-Maier ML, Poggesi I, Watelet JB. Physiologically based pharmacokinetics (PBPK), Drug Metab Rev 2009; 41(3):391-407.

43. Jones HM, Rowland-Yeo K. Basic Concepts in physiologically based pharmacokinetic modeling in drug discovery and development, CPT Pharmacometrics Syst Pharmacol 2013; 2, e63.

44. Sager E, Yu J, Ragueneau-Majlessi I, Isoherranen N. Physiologically based pharmacokinetic (PBPK) modeling and simulation approaches: a systematic review of published models, applications, and model verification, Drug Metab Dispos 2015; 43:1823–1837.

45. Zhuang X , Lu C. PBPK modeling and simulation in drug research and development, Acta Pharmac Sinica B 2016; 6(5):430-440

46. Niro R, Byers JP, Fournier RL, Bachmann K. Application of a convective-dispersion model to predict in vivo hepatic clearance from in vitro measurements utilizing cryopreserved human hepatocytes, Curr Drug Metab 2003; 4(5):357–369.

47. Chiba M, Ishii Y, Sugiyama Y. Prediction of hepatic clearance in human from in vitro data for successful drug development, AAPS J 2009; 11(2):262–276.

48. Nikolic K, Agababa D. Prediction of hepatic microsomal intrinsic clearance and human clearance values for drugs, J Mol Graph Model 2009; 28(3):245-52.

49. Stringer RA, Strain-Damerell C, Nicklin P, Houston JB. Evaluation of recombinant Cytochrome P450 enzymes as an in vitro system for metabolic clearance predictions, Drug Metab Dispos 2009; 37(5):1025–1034.

50. Chao P, Uss AS, Cheng K. Use of intrinsic clearance for prediction of human hepatic clearance, Exp Opin Drug Metab Toxicol 2010; 6(2):189–198.

51. Wang L, Chiang CW, Liang H, H, Feng W, Quinney SK, Li J, Li L. How to choose in vitro systems to predict in vivo drug clearance: a system pharmacology perspective, BioMed Res Int 2015, Article ID 857327.

52. Iwatsubo T, Hirota N, Ooie T, Suzuki H, Shimada N, Chiba K, Ishizaki T, Green CE, Tyson CA, Sugiyama Y. Prediction of in vivo drug metabolism in the human liver from in vitro metabolism data, Pharmacol Ther 1997; 73(2):147-71.

99

53. Liu L, Pang KS. The roles of transporters and enzymes in hepatic drug processing, Drug Metab Dispos 2005; 33:1-9.

54. Bi YA, Kazolias D, Duignan DB. Use of cryopreserved human hepatocytes in sandwich culture to measure hepatobiliary transport. Drug Metab Dispos 2006; 34(9):1658-1665.

55. Watanabe T, Kusuhara H, Watanabe T, Debori Y, Maeda K, Kondo T, Nakayama H, Horita S, Ogilvie BW, Parkinson A, Hu Z, Sugiyama Y. Prediction of the overall renal tubular secretion and hepatic clearance of anionic drugs and a renal drug-drug interaction involving organic anion transporter 3 in humans by in vitro uptake experiments, Drug Metab Dispos 2011; 39(6):1031-1038.

56. Barton HA, Lai Y, Goosen TC, Jones HM, El-Kattan AF, Gosset JR, Lin J, Varma MV. Model-based approaches to predict drug-drug interactions associated with hepatic uptake transporters: preclinical, clinical and beyond. Expert Opin Drug Metab Toxicol 2013; 9:459–472.

57. Schneider G, Coassolo P, Lavé T. Combining in vitro and in vivo pharmacokinetic data for prediction of hepatic drug clearance in humans by artificial neural networks and multivariate statistical techniques, J Med Chem

1999; 42 (25): 5072–5076. 58. Zuegge J, Schneider G, Coassolo P, Lavé T. Prediction of Hepatic Metabolic

Clearance, Clin Pharmacokinet 2001; 40(7):553-563. 59. Ekins S, Obach RS. Three-dimensional quantitative structure activity relationship

computational approaches for prediction of human in vitro intrinsic clearance, J Pharmacol Exp Ther 2000; 295(2):463-573.

60. Gao H, Yao L, Mathieu HW, Zhang Y, Maurer TS, Troutman MD, Scott DO, Ruggeri RB, Lin J. In silico modeling of nonspecific binding to human liver microsomes, Drug Metab Disp 2008; 36(10):2130-2135.

61. Ekins S. In silico approaches to predicting drug metabolism, toxicology and beyond, Biochem Soc Transact 2003; 31(3):611-614.

62. Lee PH, Cucurull-Sanchez L, Lu J, Du YJ. Development of in silico models for human liver microsomal stability, J Comp Aided Mol Des 2007; 21(12): 665–673.

63. Sakiyama Y, Yuki H, Moriya T, Hattori K, Suzuki M, Shimada K, Honma T. Predicting human liver microsomal stability with machine learning techniques, J Mol Graph Model 2007; 26(6):907-915.

64. Liu R, Schyman P, Wallqvist A. Critically assessing the predictive power of QSAR models for human liver microsomal stability, J. Chem Inf Model 2015; 55 (8):1566–1575.

100

65. Ekins S, Bravi G, Wikel JH, Wrighton SA. Three-dimentional quantitative structure – activity relationship analysis of Cytochrome P-450 3A4 substrates, J Pharmacol Exp Ther 1999; 29(1):424-433.

66. Ekins S, Bravi G, Binkley S, Gillespie JS, Ring BJ, Wikel JH, Wrighton SA. Three- and four dimensional-quantitative structure activity relationship (3D/4D-QSAR) analyses of CYP2C9 inhibitors, Drug Metab Dispos. 2000; 28:994–1002.

67. Molnar L, Keseru GM. A neural network based virtual screening of cytochrome P450 3A4 inhibitors, Bioorg Med Chem Let 2002; 12(3):419-421.

68. Lewis DFV, Modi S, Dickins M. Structure – activity relationship for human cytochrome P450 substrates and inhibitors, Drug Metab Rev 2002; 34:69-82.

69. Lewis DFV. Quantitative structure-activity relationships (QSARs) within the Cytochrome P450 system: QSARs describing substrate binding, inhibition and induction of P450s, Inflammopharmacol 2003; 11(1):43-73.

70. Long A, Walker JD. QSARs for predicting metabolism and modeling Cytochrome P450 enzyme activities, Environ Toxicol Chem 2003; 22(8):1894-1899.

71. Ekins S, Berbaum J, Harrison RK. Generation and validation of rapid computational filters for CYP2D6 and CYP3A4, Drug Metab Dispos 2003; 31 (9):1077-1080.

72. Hansch C, Mekapati SB, Kurup A, Verma RP, QSAR of cytochrome P450, Drug Metab Rev 2004; 36:105-156.

73. Arimoto R, Prasad M-A, Gifford E M. Development of CYP3A4 inhibition models: comparisons of machine-learning techniques and molecular descriptors, J Biomol Screening 2005; 10(3):197-205.

74. Mao B, Gozalbes R, Barbosa F, Migeon J, Merrick S, Kamm K, Wong E, Costales C, Shi W, Wu C, Froloff N. QSAR Modeling of in vitro inhibition of Cytochrome P450 3A4, J Chem Inf Model 2006; 46 (5):2125-2134.

75. Lewis DF, Lake BG, Dickins M. Quantitative structure-activity relationships (QSARs) in inhibitors of various Cytochromes P450: The importance of compound lipophilicity, J. Enzyme Inhib Med Chem 2007; 22:1-6.

76. Gleeson M.P, Davis AM, Chohan KK, Paine SW, Boyer S, Gavaghan CL, Arnby CH, Kankkonen C, Albertson N. Generation of in-silico cytochrome P450 1A2, 2C9, 2C19, 2D6, and 3A4 inhibition QSAR models, J Comput - Aided Mol Des 2007; 21(10-11):559-573.

77. Zhou D, Liu R, Otmani SA, Grimm SW, Zauhar RJ, Zamora I. Rapid Classification of CYP3A4 Inhibition Potential Using Support Vector Machine Approach, Lett Drug Des Discov 2007; 4(3):192-200.

101

78. Roy K, Roy PP. Exploring QSAR and QAAR for inhibitors of cytochrome P450 2A6 and 2A5 enzymes using genetic function approximation GFA and genetic partial least square G/PLS techniques, Eur J Med Chem 2009; 44(5):1941-1951.

79. Roy K, Roy PP. QSAR of cytochrome inhibitors, Expert Opin Drug Metab Toxicol 2009; 5:1245– 1266.

80. Freitas RF, Bauab RL, Montanari CA. Novel application of 2D and 3D-similarity searches to identify substrates among Cytochrome P450 2C9, 2D6, and 3A4, J Chem Inf Model 2010; 50(1):97-109.

81. Sridhar J, Liu J, Foroozesh1 M, Stevens CLK. Insights on Cytochrome P450 enzymes and inhibitors obtained through QSAR studies, Molecules 2013;17(8): 9283–9305.

82. Terfloth L, Bienfait B, Gasteiger J. Ligand-based models for the isoform specificity of cytochrome P450 3A4, 2D6, and 2C9 substrates. J Chem Inf Model 2007; 47(4): 1688-1701.

83. Mishra NK, Agarwal S, Raghava GPS. Prediction of CytP450 isoform responsible for metabolizing a drug molecule, BMC Pharmacol 2010; 10:8.

84. Nembri S, Grisoni F, Consonni V, Todeschini R. In silico prediction of Cytochrome P450-drug interaction: QSARs for CYP3A4 and CYP2C9. Int J Mol Sci 2016;17:914.

85. Li H, Sun J, Fan X, Sui X, Zhang L, Wang Y, He Z. Considerations and recent advances in QSAR models for cytochrome P450-mediated drug metabolism prediction, J Comp Aided Mol Des 2008; 22:843-855.

86. Smith PA, Sorich MJ, McKinnon RA, Miners JO. Pharmacophore and quantitative structure-activity relationship modeling: complementary approaches for the rationalization and prediction of UDP-glucuronosyltransferase 1A4 substrate selectivity, J Med Chem 2003; 46(9):1617-1626.

87. Smith PA, Sorich MJ, Low LSC, McKinnon RA, Miners JO. Towards integrated ADME prediction: past, present and future directions for modeling metabolism by UDP-glucuronyltransferases. J Mol Graph Model 2004; 22:507-517.

88. Wang, JF, Chou KC. Molecular modeling of cytochrome P450 and drug metabolism, Curr Drug Metab 2010; 11(4):342-346.

89. Sun H, Scott DO. Structure-based drug metabolism predictions for drug design. Chem Biol Drug Des 2010: 75:3-17.

90. Roy PP, Roy K. Pharmacophore mapping, molecular docking and QSAR studies of structurally diverse compounds as CYP2B6 inhibitors, Mol Simul 2010; 36:(11):887-905.

102

91. Ridder L, Mulholland AJ, Rietjens IMCM, Vervoort J. A quantum mechanical/molecular mechanical study of the hydroxylation of phenol and halogenated derivatives by phenol hydroxylase, J Am Chem Soc 2000; 122(36):8728-8738.

92. Bathelt CM, Zurek J, Mulholland AJ, Harvey JN. Electronic structure of compound I in human isoforms of cytochrome P450 from QM/MM modeling, J Am Chem Soc 2005; 127(37):12900-12908.

93. Ranaghan KE, Mulholland AJ. Investigations of enzyme catalyzed reactions with combined quantum mechanics/molecular mechanics (QM/MM) methods, Int Rev Phys Chem 2010; 29 (1): 65-133.

94. Braga RC, Andrade CH. QSAR and QM/MM approaches applied to drug metabolism prediction, Mini Rev Med Chem 2012; 12:573-582.

95. Ekins S, de Groot MJ, Jones JP. Pharmacophore and three-dimensional quantitative structure activity relationship methods for modeling cytochrome p450 active sites, Drug Metab Dispos 2001; 29(7):936–944.

96. de Groot M, Ekins S. Pharmacophore modeling of cytochromes P450, Adv Drug Deliv Rev 2002; 54:367-383.

97. Madden JC, Cronin MT. Structure-based methods for the prediction of drug metabolism, Expert Opin Drug Metab Toxicol 2006; 2(4):545-557.

98. Shimada T, Oda Y, Gillam EM, Guengerich FP, Inoue K. Metabolic activation of polycyclic aromatic hydrocarbons and other procarcinogens by cytochromes P450 1A1 and P450 1B1 allelic variants and other human cytochromes P450 in salmonella typhimurium NM2009, Drug Metab Dispos 2001; 29:1176-1182.

99. Van Damme S, Bultinck P. Conceptual DFT properties-based 3D QSAR: Analysis of inhibitors of the nicotine metabolizing CYP2A6 enzyme, J Comput Chem 2009; 30:1749-1757.

100. Rahnasto MK, Raunio HA, Wittekindt C, Salminen KA, Leppanen J, Juvonen RO, Poso A, Lahtela-Kakkonen MK. Identification of novel CYP2A6 inhibitors by virtual screening, Bioorg Med Chem 2011; 19:7186-7193.

101. Gonzalez J, Marchand-Geneste N, Giraudel JL, Shimada T. Docking and QSAR comparative studies of polycyclic aromatic hydrocarbons and other procarcinogen interactions with cytochromes P450 1A1 and 1B1. SAR QSAR Environ Res 2012; 23:87-109.

102. Gharaghani S, Khayamian T, Keshavarz F. Docking, molecular dynamics simulation studies, and structure-based QSAR model on cytochrome P450 2A6 inhibitors. Struct Chem 2012; 23:341-350.

103

103. Karalis V, Tsantali-Kakoulidou A, Macheras P. Quantutative structure – pharmacokinetic relationships for disposition parameters of cephalosporins, Eur J Pharm Sci 2003; 20:115-23.

104. Turner JV, Maddalena DJ, Cutler DJ, Agatonovich-Kustrin S. Multiple pharmacokinetic parameter prediction for a series of cephalosporines, J Pharm Sci 2003; 92:552-559.

105. Turner JV, Maddalena DJ, Cutler DJ. Pharmacokinetic parameter prediction from drug structure using artificial neural networks, Int J Pharmac 2004; 270:209-219.

106. Ng C, Xiao Y, Putnam W, Lum B, Tropsha A. Quantitative structure – pharmacokinetic parameters relationships (QSPkR) analysis of antimicrobial agents in human using simulated annealing k-nearest neighbor and partial least square analysis methods, J Pharm Sci 2004; 93(10):2535-2544.

107. Yap CW, Li ZR, Chen YZ. Quantitative structure – pharmacokinetic relationships for drug clearance by using statistical learning methods, J Mol Graph Model 2006; 24:383-395.

108. Yu MJ. Predicting total clearance in humans from chemical structure. J Chem Inf Model 2010; 50(7):1284-1295.

109. Berellini G, Waters NJ, Lombardo F. In silico prediction of total human plasma clearance, J Chem Inf Model 2012; 52:2069-2078.

110. Gombar VK, Hall SD. Quantitative structure-activity relationship models of clinical pharmacokinetics: clearance and volume of distribution, J Chem Inf Model 2013; 53(4):948-957.

111. Paine WS, Barton P, Bird J, Denton R, Menochet K, Smith A, Tomkinson NP, Chohan KK. A rapid computational filter for predicting the rate of human renal clearance, J Mol Graph Model 2010; 29:529-537.

112. Lombardo F, Obach RS, Varma MV, Stringer R, Berellini G. Clearance mechanism assignment and total clearance prediction in human based upon in silico models, J Med Chem 2014; 57:4392-4405.

113. Manga NDJ, Duffy JC, Rowe PH, Cronin MTD. A hierarchical QSAR model for urinary excretion of drugs in humans as a predictive tool for biotransformation, QSAR Comb Sci 2003; 22:263-73.

114. Doddareddy MR, Cho YS, Koh HY, Kim DH, Pae AN. In silico renal clearance model using classical Volsurf approach, J Chem Inf Model 2006; 46:1312-1320.

115. Smith DA, Allerton C, Kalgutkar AS, Waterbeemd H, Walker DK. Distribution, In: Pharmacokinetics and metabolism in drug design, 3rd Ed; Wiley – VCH Verlag GmbH&Co. KgaA:Weinheim, Germany, 2012, pp. 61-79.

104

116. Kusama M, Toshimoto K, Maeda K, Hirai Y, Imai S, Chiba K, Akiyama Y, Sugiyama Y. In silico classification of major clearance pathways of drugs with their physicochemical parameters, Drug Metab Dispos 2010; 38(8):1362-1370.

117. Zhivkova Z, Doytchinova I. Quantitative structure – clearance relationships of acidic drugs, Mol Pharmac 2013; 10:3758-3768.

118. Zhivkova Z, Doytchinova I. Quantitative structure – pharmacokinetic relationships for plasma clearance of basic drugs with consideration of the major elimination pathway, J Pharm Pharm Sci 2017; 20:135-147.

119. Obach RS, Lombardo F, Waters NJ. Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds, Drug Metab Dispos 2008; 36(7):1385-1405.

120. https://www.drugbank.ca/ 121. http://www.chemicalbook.com/ 122. Fleming FF, Yao L, Ravikumar PC, Funk L, Shook BC. Nitrile-Containing

Pharmaceuticals: Efficacious roles of the nitrile pharmacophore, J Med Chem 2019; 53(22):7902-7919.

123. Anderson GD. Phenobarbital and other barbiturates. Chemistry, biotransformation, and pharmacokinetics. In: Antiepileptic drugs; Levy R, Mattson RH, Meldrum BS, Perucca E, Eds; Lippincott Williams & Wilkins: Philadelphia, 2002; pp. 496-503.

124. Yamashita W, Maeda K, Hirouchi M, Adavhi Y, Hu Z, Sugiyama Y. Involvment of the transporters in the hepatic uptake and biliary excretion of valsartan, a selective antagonist of the angiotensin II AT1-receptor, in humans, Drug Metab Dispos 2006; 32:1247-1254.

125. Heines ST, Dowling T, Harvey Rd. Warfarin. In: Applied pharmacokinetics and pharmacodynamics, 4th Ed; Burton ME, Shaw LM, Schentag JJ, Evans WE, Eds; Lippincott Williams&Wilkins: Philadelphia, 2006; pp.715-751.

126. Foulds G, Stankevish JP, Marshall DC, O’Brian MM, Hayes SL, Weidler DJ, McMahon FG. Pharmacokinetics of sulbactam in humans, Antimicrob Agents Chemother 1983; 23:692-699.

127. Benet LZ, Broccateli F, Oprea TI. BDDCS applied to over 900 drugs, APPS J 2011; 12(4):511-547.

128. McNeil JJ, Louis WJ. Clinical pharmacokinetics of labetalol, Clin Pharmacokinet 1984; 9(2):157-167.

129. Glare PA, Walsh TD. Clinical pharmacokinetics of morphine, Ther Drug Monit 1991; 13(1):1-23.

105

130. Wahlstrom A, Persson K, Rane A. Metabolic interaction between morphine and naloxone in human liver. A common pathway of glucuronidation? Drug Metab Dispos 1989; 17(2): 218-220.

131. Hildebrandt R, Wagner B, Preiss-Nowzohour K, Gundert- Remy U. Fenoterol metabolism in man: sulphation versus glucuronidation, Xenobiotica 1994; 24(1):71-77.

132. Van der Geest R, Kruger P, Gubbens-Stibbe JM, Van Laar T, Bodde HE, Danhof M. Assay of R-apomorphine, S-apomorphine, apocodeine, isoapocodeine and their glucuronide and sulphate conjugates in plasma and urine of patients with Parkinson’s disease, J Chrom B: Biomed Sci Appl 1997; 702(1-2):131-141.

133. Gilfich HJ, Kremer G, Moerke W, Mutschler E, Voelger KD. Pharmacokinetics of triamterene after iv administration to man: determination of bioavailability, Eur J Clin Pharmacol 1983; 25(2):237-241.

134. Kerns EH, Di L. Drug-like properties: concepts, structure design and methods. 2nd Ed; Elsevier: Amsterdam, Boston, Heidelberg, etc, 2016, pp. 161-197.

135. Pouhia R, Seppala T, Olkkola KT, Kaslo E. The pharmacokinetics and metabolism of oxycodone after intramuscular and oral administration in healthy subjects, Br J Clin Pharmacol 1992; 33:617-621.

136. Takala A, Kaasalainen V, Seppala T, Kaslo E, Olkkola KT. Pharmacokinetic comparison of intravenous and intranasal administration of oxycodone, Acta Anaesthesiol Scand 1997; 41(20):309-312.

137. Reynolds RD, Goczynsky RJ, Quon CY. Pharmacology and pharmacokinetics of esmolol, J Clin Pharmacol 1986; 26, Suppl. A:A3-A14.

138. Peng C-C, Shi W, Lutz JD, Kunze KL, Liu JO, Nelson WL, Isoherranen N. Stereoselective metabolism of itraconazole by CYP3A4: dioxolane ring scission of azole antifungals, Drug Metab Dispos 2012; 40(3):426-435.

139. Elkader A, Sproule B. Buprenirphine: clinical pharmacokinetics in the treatment of opioid dependence, Clin Pharmacokinet, 2005; 44(7):661-680.

140. Due SL, Sullivan HR, McMahon RE. Propoxyphene: pathways of metabolism in man and laboratory animals, Biomed Mass Spectrom 1976; 3(5):217-225.

141. Shi J, Montay G, Bhargava VO. Clinical pharmacokinetics of telithromycin, the first ketolide antibacterial, Clin Pharmacokine 2005; 44(9):915-934.

142. Kolbrich EA, Barnes AJ, Gorelick DA, Boyd SJ, Cone EJ, Hiesfis MA. Major and minor metabolites of cocaine in human plasma following controlled subcutaneous cocaine administration, J Anal Toxicol, 2006; 30:501-510.

106

143. Golden RN, Dawkins K, Nichilas L. Trazodone and nefazodone. In: The American psychiatric publishing textbook of psychopharmacology, 4th Ed; Scatzberg AF, Nemeroff CB, Eds;; American Psychiatric Publ Inc: Aelingtun, VA, 2009, pp.403-414.

144. Alfieri R, Galetti M, Tramonti S, Andreoli P, Mozzoni P, Cavazzoni A, Bonelli M, Fumarola C, La Monica S, de Palma G, Mutti A, Mor M, Tiseo M, Mari E, Ardizzoni A, Petronini PG. Metabolism of the EGFR thyrosin kinase inhibitor gefitinib by cytochrome P450 1A1 enzyme in EGFR-wild type nonsmall cell lung cancer cell lines, Mol Cancer 2011; 10:143.

107

Chapter 4

QSPkR modeling of plasma protein binding (PPB)

Majority of drugs bind reversibly to various proteins in plasma, where they present in two forms – free and bound, existing in equilibrium. Plasma protein binding (PPB) has two important physiological functions – transport and depot. Plasma proteins act as delivery systems for lipophilic drugs otherwise insoluble in blood, thus facilitating their distribution and allowing them to reach their site of action. Drug-protein complex serves as a reservoir from which the drug is gradually released so that a proper free drug concentration is maintained for a long time period.

Major drug – binding proteins in plasma

Two plasma proteins are primarily responsible for PPB of most of the drugs: human serum albumin (HSA) and alpha-1-acid glycoprotein (AGP).

Human serum albumin

HSA presents in high concentration (~ 600 µM) and accounts for about 50-60% of all plasma proteins. It plays important role in the regulation of colloidal osmotic pressure and the transport of numerous endogenous compounds (fatty acids, hormones, bilirubin, bile acids, thyroxin, metals, etc. [1 – 3]. In addition, HSA binds a huge number of drugs and metabolites with variable affinity and capacity. It is generally accepted, that HSA is specialized in binding of lipophilic compounds with anionic or electronegative features, however many reports suggest considerable affinity for cationic structures [3 – 7].

HSA represents a single polypeptide chain consisting of 585 amino acid residuals, roughly equal number of hydrophobic and polar ones, most of them with acidic or basic function [8]. It has long been supposed that binding of drugs to HSA occurs at a small number of distinct binding sites with different selectivity [1, 9]. A large number of reports have been published aiming clearing of the nature of drug binding regions. Sudlow et al. identified two distinct binding sites on HSA for anionic drugs (Site 1 and Site2) and proposed some structural requirements for specific binding [10]. Large heterocyclic molecules with a negative charge localized at the center of a largely non-polar region were classified as Site 1 drugs. Site 2 showed preference toward aromatic carboxylic acids with an extended shape and a negative charge at one end of the molecule away from a hydrophobic center. The existence of Sudlow’s Site 1 and Site 2 was confirmed later by X-ray crystallographic studies [9, 11 – 13].

108

The high-resolution X-ray crystallographic analysis revealed that HSA is a heart-shaped protein composed of three homologous helical domains (I – III), each of which consisting of two sub-domains (A and B). Seven binding sites for fatty acids (FAs) and two specific drug binding sites were identified [9, 12, 14 – 16]. The FAs binding sites are distributed asymmetrically throughout the whole polypeptide chain, while the drug binding sites are located in the hydrophobic cavities in sub-domains IIA and IIIA (He) [12].

X-ray analysis of complexes of HSA with various ligands allowed identification of the architecture of drug binding sites, the mode of binding and amino-acid residuals involved. Crystallographic results are in a good agreement with the predicted location of the binding sites, based on competitive inhibition and spectroscopic studies [10, 17, 18]. Therefore, it was proved that the binding pocket in sub-domain IIA corresponds to Sudlow’s Site I, and the binding cavity in sub-domain IIIA – to Sudlow’s Site II [9].

The most detailed information regarding the two primary drug binding sites on HSA was gained from the X-ray analysis of complexes of HSA with 12 drugs and toxins – in the absence, and in the presence of FAs [19]. This study describes the geometry of drug Site 1 and Site 2, identifies the key determinants of binding specificity, reveals the capacity of both pockets for flexible accommodation, identifies numerous secondary binding sites, and explains the effect of FAs on the binding affinity for Site 1 drugs. Despite the topological similarity, Site 1 and Site 2 differ considerable in their shape, size and polarity which determine their binding selectivity. Drug Site 1 (in sub-domain IIA) is larger and consists of a central zone and three extended sub-chambers. The interior of the pocket is predominantly hydrophobic. Two clusters of polar residues are identified: one of them – towards the bottom of the pocket (Tyr150, His 242, Arg 257) and the other – at the pocket entrance (Lys195, Lys199, Arg218, Arg222). Warfarin, phenylbutazone, oxyphenbutazone and 3-carboxy-4-methyl-5-propyl-2-furanpropanoic acid are found to cluster in the center of the pocket. The complexation is governed by both hydrophobic contacts and specific interactions (hydrogen bonding or salt bridges) with polar residuals (Tyr150, His242, Lys199, Arg222 and Arg257). It is supposed that Site 1 has preference for molecules with two anionic or electronegative features on the opposite sides of the molecule that can simultaneously interact with the two polar patches [19]. Drug Site 2 (sub-domain IIIA) is smaller than Site 1 and consists of one sub-chamber corresponding to the right sub-chamber of Site 1. It was observed, that this site was accessed only following ligand-induced side chain movement [19]. The entrance of the pocket is open, well exposed to the solvent. There is a single polar

109

cluster, located close to one side of the entrance of the pocket including Arg410, Tyr411, Lys414 and Ser489. The presence of only one basic polar patch, located unilaterally in the hydrophobic pocket, determines the specificity of Site 2 for drugs with peripherally located electronegative group. All of the studied Site 2 drugs (diazepam, diflunisal, ibuprofen and indoxyl sulphate) are located in the center of the pocket and interact with Tyr411. It is assumed that Arg410 and Ser489 are also involved in salt bridges and hydrogen bonding.

Although there are many examples for binding of cationic drugs to HSA, the nature of the binding sites and the modes of binding are insufficiently explored. Crystallographic analysis of HSA complexed with lidocaine revealed that the drug binds to one superficially placed site distinct of those previously reported for FAs and anionic drugs. It is formed by residuals from sub-domain IB and faces the central, inter-domain crevice [20]. The aromatic part of lidocaine is involved in cation-π interactions with Arg114, and the carbonyl group interacts with Lys190. The open nature of the binding site and the absence of hydrophobic interactions are considered as the main reason for the low affinity of HSA to lidocaine (Hein) [20]. In contrast, cationic aliphatic detergents were found to bind with the aliphatic chain inserted into the hydrophobic cavity of HSA, and polar heads located at the surface. The binding is governed by hydrophobic forces and supported by polar interactions [21].

Alpha-1-acid glycoprotein

AGP is an acute phase protein in blood. It shows preference for basic and neutral endogenous and exogenous compounds, but also binds some acidic drugs such as warfarin, phenobarbital, retinoic acid, etc. [22 – 25].Concentration of AGP may vary considerably – in healthy subjects it ranges between 10and 30 µM, however in different physiological and pathological states it can increase up to 5-fold [26, 27]. Despite of the lower concentration and binding capacity, complexation with AGP may contribute considerably to the total PPB especially in diseases in which the concentration of AGP increases, or/and that of HSA decreases.

Human AGP represents a single polypeptide chain composed of 183 amino acid residuals and five N-linked glycans, accounting for about the half of the total 40-44 kDa mass of the protein. In most individuals AGP exists as a mixture of three genetic variants – A, F1 and S. The F1 and S variants differ in a single amino acid residual, and are collectively referred to as F1*S variant. In human plasma F1*S:A ratio is about 3:1, however it may vary among individuals and during inflammation or disease [27]. F1*S

110

and A variants differ by approximately 20 residuals [28], and bind drugs with different selectivity. Herve et al. proved, that the preferred substrates of variant A shared marked structural similarities, most of them containing a basic amino group linked by a short carbon chain to two aromatic rings that are either bridged to form a tricyclic structure (e. g. amitryptiline, nortryptiline, imipramine, desipramine, maprotiline, promethazine) or unbridged (e. g. disopyramide, methadone, diphenhydramine) [29]. The presence of Cl atom as a substituent in the aromatic ring, as well as the presence of piperidine ring resulted in decreased ligand selectivity to A variant. A CoMFA model identified the moieties that are favorable for binding to A variant: a basic N-atom, a ring binding in a hydrophobic pocket, and another ring interacting with a hydrophobic area. In contrast, no clear analogies were found for the selective ligands of the F1*S variants (e. g. dipyridamole, warfarin, binedaline, prazosin), except for a relatively large hydrophobic ring system.

There is an agreement that AGP has one wide and flexible drug-binding area where both acidic and basic drugs bind [30]. According to Matsumoto et al. the binding site of human AGP consist of at least three partially overlapping sub-sites: a basic ligand binding site, an acidic ligand binding site, and a steroid hormone binding site [31]. It has been supposed that the ligand binding site of F1*S variant represents a relatively large hydrophobic pocket able to accommodate a variety of chemical structures, whereas the A variant binding site is smaller and of greater ligand specificity (Herve 1998, Kuroda) [29, 32]. The architecture of AGP binding site and the modes of ligand binding were finally cleared by X-ray crystallographic analysis [33, 34]. The polypeptide chain of AGP comprises eight β-strands, connected with four loops, arranged into a β-barrel, flanked by α-helices. The overall surface of AGP is negatively charged mainly due to the exposure of a large number of Asp and Glu side chains. The ligand binding pocket is located in the center of the β-barrel and is mainly hydrophobic. The binding cavity of variant F1*S is wide and consists of three distinct lobes. The largest and deepest lobe I appears to serve as main hydrophobic drug-binding chamber [33]. There are smaller lobes on its sides (II and III), both negatively charged. The drug binding pocket of A variant is similar to that found in F1*S variant, however it is narrower comprising only lobes I and II [34]. In addition, the binding sites of the two variants differ in few amino acid residuals.

Identification of the crystallographic structure of AGP enabled further studies on the mode of binding of various ligands to AGP. Molecular docking of diazepam and progesterone to F1*S variant predicted that the planar phenyl ring of diazepam fits into

111

the hydrophobic lobe I. This ligand orientation allows the polar ring to fit into the negatively charged lobe II, resulting in two hydrogen bonds of the carbonyl O-atom to the side chains of Glu64 and Gln66, as well as in polar contacts of the two ring N-atoms with Arg90 and Tyr127. Progesterone fits nicely into lobe I, and both Tyr127 and Ser40 are considered as important for its binding.

Crystal structures of a mutant of the A variant complexed with disopyramide (DSP), amitriptyline (AMT) and chlorpromazine (CPZ) were published recently and it was suggested that the mutant has undistinguishable binding properties from those of the wild type [34]. DSP and AMP are known as ligands which bind at variant A with a high degree of selectivity. They bind in essentially the same manner to lobe I making contacts with the same amino acid residuals. The binding involves edge-face contacts between the two aromatic rings of the drugs and the aromatic side chains of Phe112 and Phe49 at the bottom of the binding pocket. In addition, Ser114 is involved in a water-mediated hydrogen bonding with the amide group of DSP. The difference in the amino acid residuals between variants A and F1*S appears to be crucial for the high selectivity of the A variant for DSP, AMT and other A variant specific drugs that contain two aromatic rings with similar configuration. In the F1*S variant Phe112 is substituted by Leu112, and Ser114 – by Phe 114. These differences may contribute to the reduced binding affinity of F1A*S variant for DSM and AMT [34]. In contrast, CBZ binds non-selectively to the central cavity, lobe I, with equal affinity to both F1*S and A variants [29]. The fused aromatic ring system of CPZ is involved in stacking interactions with Phe112 side chain in A variant. It is additionally extended in length by the Cl-atom, which dictates the same position in F1*S variant.

Titration calorimetric method (ITC) was used for thermodynamic characterization of the binding of basic drugs to AGP) [27]. It was found that the complexation of the studied basic compounds (trycyclic antidepressants, mianserine, chlorpromazine, disopyramide and cimetidine) is an exothermal process governed by the interplay of enthalpy and entropy driven interactions – hydrophobic, van der Waals, electrostatic and hydrogen bonding. Combination of ITC and molecular docking contributes to better understanding of the processes involved in ligand binding to AGP.

Lipoproteins

Lipoproteins are several kinds of nanoparticles composed by phospholipids, cholesterol, apoproteins and enzymes. They are essential for the lipid exchange as they facilitate the transport of cholesterol and other lipids throughout the body. Lipoproteins may

112

contribute to PPB especially for hydrophobic drugs [35]. Examples for basic drugs, bound with lipoproteins, are imipramine and nortryptiline [36], cyclosporine A, amiodarone, halofantrine, amphotheracin B, eritoran [25], etc. The binding seems to be non-specific, simple dissolution of hydrophobic drugs within the lipid layers of lipoproteins. Binding to lipoproteins cannot be expected to contribute considerably to overall PPB because of the low lipoprotein concentration in plasma [22].

Pharmacodynamic, pharmacokinetic and clinical consequences of PPB

PPB has significant pharmacodynamic (PD) and pharmacokinetic (PK) consequences. Only the free drug can be distributed in tissues and reach the site of action. Therefore, the pharmacological activity and toxic effects depend on the free drug concentration rather than on its total concentration [1, 3, 25]. Highly bound drugs may require higher doses to achieve the effective concentration in vivo. PPB may be restrictive factor for drug elimination as both hepatic uptake and glomerular filtration depend on the free drug concentration. For drugs with poor solubility and high therapeutic plasma concentration low to moderate PPB is desired in order to have successful therapeutic consequences. Drugs that are bound with higher affinity to plasma proteins than with receptor targeted are unsuitable as therapeutic agents [25].

PPB may affect considerably key PK parameters such as apparent volume of distribution Vd, clearance CL, bioavailability F and the half-life t1/2. Vd is a measure of drug distribution throughout the body. It is frequently claimed drugs highly bound in plasma have low Vd because they are confined to the vascular space, while drugs that are largely free are readily distributed in tissues. However, there are many examples for drugs with both high PPB and high Vd. In fact, Vd depends on the relative binding affinity of the drug in plasma and tissues, as seen from the equation:

t,u

p,utpd f

fVVV +=

where Vp and Vt are the volumes of the plasma and the tissues, and fu,p and fu,t are the unbound fractions of the drug in plasma and tissues, respectively [37]. For drugs which are poorly distributed in tissues Vd is close to Vp (~ 3L), i. e. it is independent of PPB. For extensively distributed drugs Vd is proportional to fu,p.

CL is a quantitative measure for the ability of the body to eliminate a drug. The total CL is sum of renal (CLR) and non-renal clearance (CLNR). The latter is usually

113

equated to the hepatic CL (CLH) as the clearances in other organs in most cases are negligible. According to the “well-stirred model” [38 – 40], hepatic clearance CLH is given by the equation:

intp,uH

intp,uHHHH CLfQ

CLfQEQCL

+==

where QH is liver blood flow, EH – hepatic extraction ratio, and CLint – intrinsic clearance of the free drug. PPB may have either permissive or restrictive effect on drug CL depending on the value of CLint. Drugs with high CLint have high EH > 0.7 and are eliminated with high CLH, close to QH, independent of PPB. In contrast, drugs with low CLint have low EH < 0.7 and low CLH, proportional to fu,p. The same applies to renal excretion. Drugs that are excreted solely by glomerular filtration have low CLR, proportional to fu,p, while drugs, substrates of active secretion transporters, are eliminated with high CLR, independent of PPB.

PPB may also affect drug bioavailability. The fraction escaped elimination by the liver (FH) is the upper limit of the po bioavailability [37] and is given by the equation:

intp,uH

HH CLfQ

QF

+=

For drugs with negligible first-pass metabolism FH is close to 1 and is independent of PPB. However, for highly extracted drugs FH is inversely proportional to fu,p.

The effect of PPB on t1/2 reflects the changes in Vd and CL as the three parameters are related according the equation:

CLV693.0

t d2/1 =

The extent of PPB depends on the concentration of the binding proteins and their affinity for the drug. Any change in the concentration or binding affinity may have considerable effect on PPB. Protein concentration varies with the age and at several physiological and disease states [41 – 45]. The binding affinity can be changed due to competition with endogenous ligands such as FAs, or with co-administered drugs [46, 47]. Changes of PPB may provoke changes in the PK parameters. The clinical significance of the change in PPB depends on the magnitude of the respective PK parameters and the route of administration. Majority of drugs are given by multiply dose

114

regimen, wherein a steady state is reached at which the rate of drug input is equal to the rate of elimination. The average steady state concentration ss

avC is given by the equation:

τ=

CL

FDCss

av

where F is the bioavailability, D – the dose, and τ – the dosing interval [37]. The steady state concentration of the free drug ss

u,avC is equal to:

τ=

CL

FDfC p,uss

u,av

For low-clearance drugs ssu,avC remains unaffected, because the change in fu,p results

in a proportional change in the CL. However, for high clearance drugs the effect of changes in PPB depends on the route of administration. Following iv administration (F = 1) the change in fu,p will alter ss

u,avC because the CL is constant, close to the blood flow.

For po administered drugs the change in fu,p results in a inversely proportional change in F, so that ss

u,avC remains almost unchanged.

The clinical significance of drug – drug interactions at PPB level has been widely discussed in literature. It is true, that the displacement of a drug by a co-binding agent increases the free drug concentration in plasma. For years displacement have been considered as the main reason for the observed side and toxic effects by drugs co-administration. Classical examples are the competition between warfarin and phenylbutazone leading to a marked increase in the anticoagulant action of warfarin [48], and the displacement of tolbutamide by sulphonamides, resulting in hypoglycaemia [49]. Consequently it was suggested, that other mechanisms are responsible for the increase of the free drug concentration, namely impaired metabolism or renal excretion [50, 51]. There is an agreement that displacement phenomena are frequently overestimated and are of low clinical relevance for the majority of the drugs [52 – 54]. Nevertheless, the possibility of displacement interactions should be considered, and it may be particularly important for highly bound drugs (>99%) with narrow therapeutic window, high hepatic extraction ratio (if administered iv), high renal clearance, or after po administration of drugs with a fast onset of action [25, 53, 54]. PD, PK and clinical relevance of PPB on PK are discussed in several reviews and books [25, 45, 55 – 59].

QSPkR models for prediction of PPB – and overview

Giving the importance of PPB for the overall PD and PK behavior, extensive research have been made targeted to prediction of drug binding to plasma proteins. Before

115

defining the crystallographic structures of the major binding proteins, the studies were based on in vitro experiments. Numerous analytical techniques have been developed (equilibrium dialysis, ultrafiltration, fluorescence and circular dichroism, high performance liquid affinity chromatography (HPLAC), etc. The current state of the methodology is discussed in several reviews [60 – 63].

Determination of the X-ray crystallographic structures of the main binding plasma proteins enabled application of structure based in silico approaches such as molecular dynamics, molecular docking, etc. for more-detailed studying of protein-drug interactions [27, 33, 64 – 69] At the same time, quantitative stricture – pharmacokinetics relationship (QSPkR) modeling emerged as an efficient tool for PK prediction. QSPkR is an empiric approach aiming development of relationships between a PK property of interest and a number of descriptors encoding different molecular features. It is especially helpful at the very early stages of drug development as it can identify drug candidates with acceptable PK characteristics even before their synthesis, thus reducing the expense of time and resources.

Numerous QSPkR models for PPB have been published in the last two decades. Such models are of both theoretical and practical importance. They give insights into the chemical nature on drug-protein interaction and assist in the design of new compounds with desired PPB. Recently, this strategy was successfully applied for prediction of diflunisal analogues with reduced affinity to human serum albumin [70].

Most of the studies are focused on binding of drugs to HSA considered as the most important for the overall PPB. There are a few reports for congeneric series: β-lactam antibiotics [71, 72], COX-2 inhibitors [73], etc. They revealed that HSA binding is favored by lipophilicity [71] and several topological indices such as electron accessibility of the aromatic ring, presence of Cl or F atoms, methylene groups, etc. while the presence of amine groups or carbonyl O-atom has a negative effect [72]. A number of QSPkR models were developed on one and the same dataset comprising data for 94 diverse drugs and drug like compounds [74]. The binding affinity was measured by the chromatographic capacity factor, determined by HPLAC with HSA immobilized stationary phase. The studies used different descriptor sets, as well as different statistical methods for model development: multiple linear regression (MLR)[74 – 78], artificial neural networks (ANN) [79, 80], support vector machine (CVM) [77, 78], etc. Hydrophobicity (measured by logP [74, 76, 79], and molecular surface area [77]) were identified as the most significant determinants of the binding affinity of HSA. In

116

addition, several structural features appear to be important: type, number and substituents in the rings [74, 75, 77], presence of aliphatic CH groups [75, 76], halogens (F and Cl) [75], hydrogen bonding ability [77], etc. A few QSPkR models for HSA binding were proposed based on topological sub-structural molecular descriptors [81, 82]. A group contribution model for ligand binding to domain III (Site 2) of HSA was developed using data for the experimentally obtained dissociation constants for 889 diverse molecules [81]. It was suggested, that the presence of cyclic or acyclic amines (positively charged at physiological pH) contributed negatively to Site 2 binding, while anionic groups (carboxylic or sulphonic acids) had a positive impact. With some exceptions, hydrophobicity of the fragments was also favorable for PPB. The predominant contribution of hydrophobic moieties, and the significance of polar groups for specific binding of drugs to Site 1 and 2 on HSA were further confirmed through integration of fragment – based QSPkRs and molecular modeling [82]. Several QSPkR models are based on pharmacophoric similarity principle. Similarity concept coupled with partial linear regression (PLS) was used to generate a model for prediction of the primary association constant of the drugs to HSA. The presence of cationic nitrogen was considered as criterion for low binding, while the presence of two hydrogen bond acceptors, separated by a hydrophobic fragment of 5 to 16 bond unit was common pharmacophore in the high binders [83]. A pharmacophore for specific binding of acidic drugs was also proposed, consisting of two of the three units: hydrophobic center, aromatic center and hydrogen bond donor, located within a distance of 4-5 Å [84]. A combination between cluster analysis and 4D fingerprint molecular similarity analysis was used for prediction of the primary association constant of 115 drugs. The key role of the spatial relation between a non-polar region and hydrogen bond donor was suggested showing that both hydrophobic interactions and hydrogen bonding are essential for drug – HSA complexation [85].

Although the free fraction of drug in plasma (fu,p) is considered as the most reliable parameter characterizing the overall PPB in vivo, only few studies are focused on PPB to all proteins in plasma, without explicit consideration of any particular protein [86 – 88]. Votano et al. applied four modeling techniques (MLR, ANN, k nearest neighbors and SVM) on a large dataset for 1008 drugs collected from literature to predict the % PPB [86]. The models showed good predictive performance with r2

pred in between 0.59 (SVM) and 0.70 (kNN). Lipophilicity was considered as the major determinant of PPB. Two descriptors encoding the presence of –COOH, and four descriptors related to aromaticity were found to affect positively PPB, while six

117

descriptors signifying bases had a negative effect implying that acidic function contributes positively to PPB, while basic function disfavors PPB. A highly predictive hologram QSPkR was derived on a dataset of 312 structurally diverse drugs with r2

pred 0.86, however no information about the structural features governing PPB was drawn [87]. Ghafourian et al. used stepwise MLR and a few non-linear regression tree techniques to generate QSPkR models on a dataset of for 794 compounds [88]. The models yielded r2

pred in the range 0.491 – 0.646. Descriptors related to lipophilicity, negatively charged Van der Waals surface, and presence of more than 9.5 aromatic rings affected positively PPB, while descriptors encoding hydrogen bond donors, ionization as a base, non-ionized fraction at pH 7.3 and positively charged Van der Waals surface disfavored PPB. In general, lower tendency for PPB of basic drugs compared to acidic was observed.

According to all QSPkR models, two factors appear to be essential for drug binding affinity: lipophilicity and ionization state of the molecule. In general, lipophilicity is favorable for PPB, although there are examples for drugs with equal lipophilicity and quite different extent of PPB, and vice versa. According to Kratochwil et al. the effect of lipophilicity on PPB is larger for acids than for bases [83]. Yamazaki proposed a non-linear relationship between the % PPB and logD7.4 for a dataset of 90 basic and neutral drugs, but not for acidic drugs and for the mixed dataset [84]. It is well known, that acidic drugs bind preferably to HSA, and basic drugs – to AGP. Therefore different structural requirements for binding to various plasma proteins could be expected, and separate QSPkR analysis is reasonable. Here we present briefly the results of QSPkR modeling of PPB for acidic and basic drugs, published recently [89, 90].

QSPkR modeling of PPB with consideration of the ionization type of the molecules

Separate QSPkR models were generated on datasets of 132 acidic and 220 basic drugs extracted from the database of Obach et al. [91].The datasets consisted of drugs with diverse chemical structures, therapeutic action and PK properties, and the values of the main molecular features and modeled PK parameter varied in similar limits. The fraction of unbound drug in plasma (fu) was used as a quantitative measure for plasma protein binding. It was transformed to pfu = -logfu in order to ensure close to normal distribution, and for better interpretability. Therefore, high pfu value means higher extent of PPB. Chemical structures were encoded by 178 and 176 descriptors, respectively, computed by ACD/LogD version 9.08 and MDL QSAR version 9.08 software. Genetic algorithm and stepwise multiple linear regression were used for variable selection and model

118

development. Predictive ability of the models was evaluated by internal cross-validation and external test sets.

QSPkR model for acids

Numerous significant QSPkR models have been developed. The consensus model built using the most frequently emerging descriptors is shown below:

( ) ( ) ( )( ) ( )( ) ( ) ( ) 419.0acnt_SsI*101.0498.0acnt_SssssC*073.0193.04xch*76.0104.3

acnt_5intSHB*017.0074.0acnt_4intSHB*013.0046.0

StsC*081.0276.0acnt_SaasC*019.0125.0Plog*020.0208.0pfu

+±−±−±−

−±+±+

+±+±+±=

The model shows high explained variance (r2 = 0.771) and good predictive ability (q2

LOO-CV = 0.737). It allows prediction of the fu values for 57% of the drugs in the dataset within the 2-fold error of experimental value, with MFEP = 2.45. Four drugs (verlukast, fu = 0.0004, bromfenac, fu = 0.0011, tesaglitazar, fu = 0.0011, and suramin, fu = 0.0011 are identified as outliers.

PPB is favored by lipophilicity (logP), number of aromatic non-substituted atoms (descriptor SaasC_acnt); presence of cyano groups (descriptor StsC), and high number hydrogen bond (HB) donor – HB acceptor pairs (descriptors SHBint4_acnt and SHBint5_acnt). Negative effect on PPB have the presence of a 4-member ring (descriptor xch4), quaternary C-atoms (SssssC_acnt), and I-atoms (SsI_acnt). Analysis of the dataset allowed defining criteria for distinguishing between high binders (> 95%) and the rest of the dataset:

- Positive criteria: log P ≥ 3; aromatic non-bridges rings ≥ 2; presence of CN group; HB donor – acceptor separated by 4 or 5 skeletal bonds ≥ 3;

- Negative criteria: presence of quaternary C-atom; presence of 4-member ring; presence of I- atom.

A simple empiric rule was proposed, according to which for high PPB (> 95%) the difference between the number of positive and the number of negative criteria should be at least 2. Applying this rule to the dataset, 57% of the high binders were correctly classified, and only 10% of the other drugs were erroneously classified as high binders.

The QSPkR model for acidic drugs is consistent with the structure of HSA drug binding sites. According to X-ray crystallographic studies the presence of planar hydrophobic moiety and two negative centers separated by few skeletal bonds is a prerequisite for binding of drugs to site 1, and Site 2 substrates differ in the presence of only one peripherally located electronegative group [19]. These structural features are

119

encoded in the QSPkR for acidic drugs by the descriptors SaasC_acnt, SHBint4_acnt, and SHBint5_acnt. Lipophilicity, expressed as logP, is a is important for both hydrophobic interactions at the predominantly hydrophobic binding sites on both HSA and AGP, and for non-specific dissolution in various plasma proteins, including lipoproteins. The positive contribution of –CN groups have been already reported by other researchers [81]. Nitriles often play a key role as HB acceptors (Fleming) [92], and may be involved in specific interactions with polar groups in the binding sites. Xch 4 presents in 43 molecules in the dataset, mainly β-lactam antibiotics, and reflects the low PPB of this therapeutic class. There are two drugs in the dataset, each containing three I-atoms, with fu 0.98 and 1. The negative effect of this descriptor may be due to the large volume of I-atoms and reduced ability to fit into the binding site. Also, the negative effect of quaternary C-atoms may be related to steric hindrances.

QSPkR model for bases

The best QSPkR model for PPB of basic drugs, built on 170 molecules, with explained variance r2 = 0.601, is shown below:

( ) ( ) ( )( ) ( ) ( ) 145.0acnt_SssssC*039.0117.0f*102.0428.0Vol*0004.00022.0

acnt_SaaCH*0013.0053.0acnt_SaaaC*028.0077.0Plog*021.0109.0pf

B

u

−±−±−±+

+±+±+±=

Six drugs are found to be outliers from the model: amsalog (fu = 0.0011), ziprasidone (fu = 0.0012), amlodipine (fu = 0.005), tamsulodin (fu = 0.01), caspofungin (fu = 0.035) and leuprolide (fu = 0.54). The good predictive ability of the model is assessed by internal validation (q2

LOO-CV 0.551; q2LGO-CV 0.519, GMFEP 2.02, Accuracy

58%) and external independent test set of 44 molecules (r2pred 0.532, GMFEP 1.94,

Accuracy 59%). Four drugs are considered as outliers from the external test set: oxybutinine (fu = 0.0004), amiodarone (fu = 0.0002), cetrorelix (fu = 0.014) and buprenorphine (fu = 0.04).

According to the model, lipophilicity (logP), number of aromatic C-atoms – either non-substituted (SaaCH_acnt), or in bridged rings (SaaaC_acnt) and molecular volume (Vol) affect positively PPB, while ionization at pH 7.4 (fB) and presence of quaternary C-atoms (SssssC_acnt) disfavor PPB. Descriptors have clear physical meaning and allow defining criteria for high PPB of basic drugs:

- Positive criteria: log P ≥ 3; atoms of the type aaCH ≥ 6; presence of atoms of the type aaaC; molar volume > 300;

120

- Negative feature: presence of ssssC- atoms.

An empiric rule is proposed for distinguishing between drugs with different extent of PPB based on the difference between the number of positive and the number of negative criteria. For very high binders (fu ≤ 0.01) the difference between the number of positive and the number of negative features should be ≥ 3; for high binders (0.01 < fu ≤ 0.1) it should be ≥ 2. A difference ≤ 1 is an indication for low PPB. Applying this empirical rule, 69% of the very high binders, 71% of the high binders and 91% of the low binders are correctly classified.

The QSPkR model is well interpretable with respect to the structure of the main plasma proteins and their binding sites. Lipophilicity is important for both hydrophobic interactions at the mainly hydrophobic binding sites on both HSA and AGP, and for non-speciffic dissolution in all plasma proteins. The requirement for six or more aromatic CH-groups implies the presence of at least two aromatic rings, which is a prerequisite for specific CH - π, π - π stacking and van der Waals interactions in the binding site of AGP. The large molecular volume may ensure a tight fit of the molecule in the binding cavity and closer contact to the suitable amino-acid residuals. The negative effect of basicity (encoded by fB) is well known. It could be related to the higher tendency of cationic drugs to interact with phospholipide membranes and to distribute extensively into tissues rather than to reside in plasma [93]. The negative effect of the quaternary C-atom has been already observed for PPB of acids, and is due probably to steric hindrances for fitting at the binding site.

Most of the outliers from both models are high or very high binders with fu 0.0002 – 0.01, i. e. 99 – 99.8% PPB. The free drug concentration depends crucially on the rate of dissociation of drug-protein complex. The release of tightly bound drugs from the complex could be very slow which would result in very low free drug concentration. This kinetic factor is not taken into account in QSPkR modeling which assumes rapid dissociation of the complex, and this is the most probable reason for the incorrect prediction of high binders. In addition, the low free concentration in plasma requires high sensitive analytical techniques and special conditions to preserve the equilibrium state, therefore the possibility for incorrect experimental values for fu cannot be dismissed. Few outliers could be considered as structural outliers: buprenorphine – due to the presence of 5 unfavorable quaternary C-atoms, and leuprolide, caspofungine and cetrorelix – due to their very high molecular weight of 1000 g/mol rendering a large PSA, improper for the binding at predominantly hydrophobic protein binding sites.

121

References:

1. Peters T. All about albumin: biochemistry, genetics and medical application. San Diego: Academic Press 1996

2. Fanali G, di Masi A, Trezza V, Marino M, Fasano M, Ascenzi P. Human serum albumin: from bench to bedside, Mol Aspects Med 2012; 33(3):209-290.

3. Kragh-Hansen U. Molecular aspects of ligand binding to serum albumin, Pharmacol Rev 1981;33(1):17-53.

4. Aki H, Yamamoto M. Thermodynamic of the binding of phenothiazines to human serum plasma, human serum albumin and alpha 1-acid glycoprotein: a calorimetric study, J Pharm Pharmacol 1989; 41(10):674-679.

5. Martinez-Gomez MA, Sagrado S, Vilanueva-Camanas RM, Medina-Hernandez MJ. Characterization of basic drug-human serum protein interactions by capillary electrophoresis, Electrophoresis 2006; 27(17):3410-3419.

6. Martinez-Gomez MA, Carril-Aviles MM, Sagrado S, Vilanueva-Camanas RM, Medina-Hernandez MJ. Characterization of antihistamine-human serum protein interactions by capillary electrophoresis, J Chromatogr A 2007; 1147(2):261-269.

7. Bai H-X, Liu X-H, Yang F, Yang X-R. Interactions of human serum albumin with phenotiazine drugs: insights from fluorescence spectroscopic studies, J Chin Chem Soc 2009; 56: 696-702.

8. Meloun B, Moravek L, Kostka V. Complete amino acid sequence of human serum albumin, FEBS Letters 1975; 58 (1):134-137.

9. Carter DC, Ho JX. Structure of human serum albumin, Adv Prot Chem 1994; 45:153-203.

10. Sudlow G, Birkett DJ, Wade DN. The characterization of two specific drug binding sites on human serum albumin, Mol Pharmacol 1975; 11: 824-832.

11. Carter CC, He X-M, Munson SH, Twigg PD, Gernert KM, Broom MB, Miller TY. Three-dimensional structure of human serum albumin, Science 1989; 244: 1195-1198.

12. He XM, Carter DC. Atomic structure and chemistry of human serum albumin, Nature 1992; 358:209-215.

13. Sugio S, Kashima A, Mochizuki S, Noda M, Kobayashi K. Crystal structure of human serum albumin at 2.5Å resolution, Prot Engin 1999; 12 (6); 439-446.

14. Curry S, Mandelkow H, Brick P, Franks N. Crystal structure of human serum albumin with fatty acid reveals an asymmetric distribution of binding sites, Nat Struct Biol 1998; 5(9): 827-835.

122

15. Curry S, Brick P, Franks NP. Fatty acid binding to human serum albumin: new insights from crystallographic studies, Biochem Biophys Acta 1999; 1441:131-140.

16. Bhattacharya A, Gruene T, Curry S. Crystallographic analysis reveals common modes of binding of medium and long chain fatty acids to human serum albumin, J Mol Biol 2000; 303(5):721-732.

17. Swaney JB, Klotz IM. Amino acid sequence adjoining the lone tryptophan of human serum albumin. A binding site of protein, Biochem 1970; 9:2570-2574.

18. Fehske KJ, Muller WE, Wollert U. The lone tryptophan residue of human serum albumin as part of the specific warfarin binding site, Mol Pharmacol 1979; 16:778-789.

19. Ghuman J, Zunszain A, Petipas I, Bhattacharya AA, Otagiri M, Curry S. Structural basis of drug-binding specificity of human serum albumin, J Mol Biol 2005; 353:38-52.

20. Hein KL, Kragh-Hansen U, Morth JP, Jeppesen MD, Otzen D, Moeller JV, Nissen P. Crystallographic analysis reveals a unique lidocaine binding site on human serum albumin, J Struct Biol 2010; 171:353-360.

21. Peyre V, Lair V, André V, le Maire G, Kragh-Hansen U, le Maire M, Moeller JV. Detergent binding as a sensor of hydrophobicity and polar interactions in the binding cavities of proteins, Langmuir 2005; 21(19):8865-8875.

22. Routledge PA. The plasma protein binding of basic drugs, Br J Clin Pharmacol 1986; 22:499-506.

23. Fournier T, Medjoubi NN, Porquet D. Alpha-1-acid glycoprotein, Biochem Biophys Acta 2000; 1482:157-171.

24. Izraili ZH, Dayton PG. Human alpha-1-glycoprotein and its interactions with drugs, Drug Metab Rev 2001; 33:161-235.

25. Ascenzi P, Fanali G, Fasano M, Pallottini V, Trezza V. Clinical relevance of drug binding to plasma proteins, J Mol Struct 2014; 1077:4-13.

26. Zsila F, Iwao Y. The drug binding site of human α1-acid glycoprotein: insight from induced circular dichroism and electronic absorption spectra, Biochem Viophys Acta 2007; 1770:797-809.

27. Huang JH, Cooper MA, Baker MA, Azad MAK, Nation RL, Li J, Velkov T. Drug-binding energetics of human α-1-acid glycoprotein assessed by isothermal titration colorimetry and molecular docking simulations, J Mol Recognit 2012; 25(12):642-656.

123

28. Schmid K. Human plasma alpha 1-acid glycoprotein - biochemical properties, the amino acid sequence and the structure of the carbohydrate moiety, variants and polymorphism. Prog Clin Biol Res 1989;300:7-22.

29. Herve F, Caron G, Duche J-C, Gaillard P, Rahman RA, Tsantili-Kakoulidou A, Carrupt P-A, DÁthis P, Tillement J-P, Testa B. Ligand specificity of the genetic variants of human a-1-acid glycoprotein: generation of a three-dimensional quantitative structure-activity relationship model for drug binding to the A variant, Mol Pharmacol 1998; 54:129-138.

30. Maruyama T, Otagiri M, Takedate A. Characterization of drug binding to sites on alpha-1-acid glycoprotein, Chem Pharm Bill 1990; 38:1688-1691.

31. Matsumoto K, Sukimoto K, Nishi K, Maruyama T, Suenaga A, Otagiri M. Characterization of ligand binding sites on the alpha-1-acid glycoprotein in humans, bovine and dogs, Drug Metab Pharmacokinet 2002; 17(4):300-306.

32. Kuroda Y, Matsumoto S, Shibukawa A, Nakagawa T. Capillary electrophoretic study on pH dependence of enantioselective disopyramide binding to genetic variants of human alpha1-acid glycoprotein, Analyst 2003;128(8):1023-1027.

33. Schoenfeld DL, Ravelli RBG, Mueller U, Skerra A. The 1.8A crystal structure of a-1-acid glycoprotein (orosomucoid) solved by UV RIP reveals the broad drug – binding activity of this human plasma lipocalin, J Mol Biol 2008; 384:393-405.

34. Nishi K, Ono T, Nakamura T, Fukunaga N, Izumi M, Watanabe H, Suenaga A, Maruyama T, Yamagata Y, Curry S, Otagiri M. Structural insights into differences in drug-binding selectivity between two forms of human alpha1-acid glycoprotein genetic variants, the A and F1*S forms. J Biol Chem 2011; 286(16):14427-14434.

35. Wasan KM, Brocks DR, Lee SD, Sachs-Barrable K, Thornton SJ. 2008. Impact of lipoproteins on the biological activity and disposition of hydrophobic drugs: implications for drug discovery, Nature Rev Drug Discov 7:84-99.

36. Pike E, Kierulf D, Skuterud B, Bredesen JE, Lunde PKM. Drug binding in sera deficient in lipoproteins, albumin or orosomucoid, Br J Clin Pharmacol 1983; 16:233-239.

37. Rowland M, Tozer TN. Clinical Pharmacokinetics: concepts and application, 3rd Ed; Williams & Wilkins: Baltimore, Philadelphia, USA, 1995.

38. Gillette JR. Factors affecting drug metabolism, Ann NY Acad Sci 1971; 179:43-66.

39. Rowland M, Benet LZ, Graham GG. Clearance concepts in pharmacokinetics, J Pharmacokinet Biopharm 1973; 1:123-126.

124

40. Wilkinson GR, Shand DG. A physiological approach to hepatic drug clearance, Clin Pharmacol Ther 1975; 18:377-390.

41. Jusko WJ, Gretch M. Plasma and tissue binding of drugs in pharmacokinetics, Drug Metab Rev 1976; 5(1):43-143.

42. Woo J. Age related changes in protein binding of drugs: implications for therapy, J Hong Kong Geriatr Soc 1999; 9(1):14-17.

43. Haller C. Hypoalbuminemia in renal failure: pathogenesis and therapeutic considerations, Kidney Blood Press Res 2005; 28:307-310.

44. Smith DA, Di L, Kerns EH. The effect of plasma protein binding on in vivo efficacy: misconceptions in drug discovery, Nat Rev Drug Discov 2010; 9:929-939.

45. Yamasaki K, Chuang VTG, Maruyama T, Otagiri M. Albumin – drug interaction and its clinical implication, Biochim Biophys Acta 2013; 1830:5435-5443.

46. Fasano M, Curry S, Terreno E, et al. The extraordinary ligand binding properties of human serum albumin, Life 2005; 57(12):787-796.

47. Ascenzi P, Bocedi A, Notari S, Fanali G, Fesce R, Fasano M. Allosteric modulation of drugb to human serum albumin, Mini Rev Med Chem 2006; 6:483-489.

48. Aggeler PM, O’Reilly RA, Leong L, Kowitz PE. Potentiation of anticoagulant effect of warfarin by phenylbutazone, New Eng J Med 1967; 276:496-501.

49. Christiansen LK, Hansen JM, Kristiansen M. Sulphaphenazole-induced hypoglycaemic attacks in tolbutamide treated diabetics, Lancet 1963; 282(7321):1298-1301.

50. Banfield C, O’Reilly RO, Chan E, Rowland M. Phenylbutazone – warfarin interaction in man: further stereochemical and metabolic considerations, Br J Clin Pharmac 1983; 16: 669-675.

51. Hansen JM, Christiansen LK. Drug interactions with oral sulphonylurea hypoglycaemic drugs, Drugs 1977; 13:24-34.

52. Birkett DJ. Pharmacokinetics made easy. 8. Drug protein binding, Austr Prescr 1992; 15(3): 56-57.

53. Rolan PE. Plasma protein binding displacement interactions – why are they still regarded as clinically important? Br J Clin Pharmac 1994; 37:125-128.

54. Benet LZ, Hoever B-A. Changes in plasma protein binding have little clinical relevance, Clin Pharmacol Therap 2002; 71(3):115-21.

55. Mehvar R. Role of protein binding in pharmacokinetics, Am J Pharm Educ 2005; 69(5):1-8.

125

56. Kerns EH, Di L. Drug – like properties. Concepts, structures, design and methods: from ADME to toxicity optimization. Elsevier: Amsterdam, Boston, Heidelberg, etc, 2008, pp.187-196.

57. Schmidt S, Gonzales D, Derendorf H. Significance of protein binding in pharmacokinetics and pharmacodynamics, J Pharm Sci 2010; 99(3):1107-1122.

58. Bohnert T, Gan L-S. Plasma protein binding: from discovery to development, J Pharm Sci 2013; 102(9): 2953-2994.

59. Roberts JA, Pea F, Lipman J. The clinical relevance of PPB changes, Clin Pharmacokinet 2013; 52(1):1-8.

60. Oravkova J, Bohs B, Lindner W. Drug-protein binding studies. New trends in analytical and experimental methodology, J Chromatogr B 1996; 677:1-28.

61. Vuignier K, Schappler J, Veuthney J-L, Carrupt P-A, Martel S. Drug – protein review of analytical tools, Anal Bioanal Chem 2010; 398(1):53-66.

62. Zhang F, Xue J, Shao J, Jia L. Compilation of 222 drug’s plasma protein binding data and guidance for study design, Drug Discov Today 2012; 17(9/10): 475-485.

63. Zhivkova Z. Studies on drug – human serum albumin binding: the current state of the matter, Current Pharmac Des 2015; 21:1817-1830.

64. Garg A, Manidhar DM, Gokara M, Malleda Ch, Reddy SC, Subramanyam R. Elucidation of the binding mechanism of coumarin derivatives with human serum albumin, PLoS ONE 2013; 8(5):e63805.

65. Zaidi N, Ajmal MR, Rabbani G, Ahmad E, KhanRH. A comprehensive insight into binding of hippuric acid to human serum albumin: a study to uncover its impaired elimination through hemodialysis, PLoS ONE 2013; 8(8):e71422.

66. Rehman MT, Shamsi H, Khan AU. Insight into the binding mechanism of imipenem to human serum albumin by spectroscopic and computational approach, Mol Pharmac 2014; 11(6):1785-1797.

67. Dong C, Lu N, Liu Y. Binding of methacycline to human serum albumin at subdomain IIA using multispectroscopic and molecular modeling methods, Luminescence 2013; 28(6): 933-941.

68. Yeggoni DP, Gokara M, Manidhar DM, Rachamallu A, Nakka S, Reedy CS, Subramanyam R. Binding and molecular dynamics studies of 7 – hydroxycoumarin derivatives with human serum albumin and its pharmacological importance, Mol Pharmac 2014; 11(4):1117-1131.

69. Zhang P, Li Z, Wang X, Shen Z, Wang Y, Yan J, Zhou Z, Zhu W. Study of the enantioselective interaction of diclofop and human serum albumin by

126

spectroscopic and molecular modeling approaches in vitro, Chirality 2013; 25(11):719-725.

70. Mao H, Hajduk PJ, Craig R, Bell R, Bore T, Fesik SW. Rational design of diflunisal analogues with reduced affinity to human serum albumin, J Am Chem Soc 2001; 123:10429-10435.

71. Ramamurthi N, Sitarama BG. In silico ADME modeling: QSPR models for the binding of β-lactams to human serum proteins using genetic algorithm, ARKIVOC 2005; XI:102-123.

72. Hall LM, Hall LH, Kier LB. QSAR modeling of beta-lactam binding to human serum proteins, Comput Aided Mol Des 2003; 17(2-4):103-118.

73. Seedler N, Bhatia S, Singh B. Quantitative correlation between theoretical molecular descriptors and drug – HSA binding affinities for various COX – 2 inhibitors, Chem Biol Drug Des 2008; 72:297-302.

74. Colmenarejo G, Alvarez-Pedraglio A, Lavandera J-L. Cheminformatic models to predict binding affinities to human serum albumin, J Med Chem 2001; 44(25):4370-4378.

75. Hall LM, Hall LH, Kier LB. Modeling drug albumin binding affinity with E-state topological structure representation., J Chem Inf Comput Sci 2003; 43(6):2120-2128.

76. Sitarama BG, Ramamurthi N, Akash K. In silico ADME modeling 2: Computational models to predict serum albumin binding affinity using ant colony system, Bioorg Med Chem 2006; 14:4118-4129.

77. Xue CX, Zhang RS, Liu HX, et al. QSAR models for the prediction of binding affinities to human serum albumin using the heuristic method and a support vector machine, J Chem Inf Comput Sci 2004; 44(5):1693-1700.

78. Chen L, Chen X. Results of molecular docking as descriptors to predict human serum binding affinity, J Mol Graph Model 2012; 33:35-43.

79. Deeb O, Hemmateenehad B. ANN – QSAR model of drug binding to human serum albumin, Chem Biol Drug Res 2007; 70:19-29.

80. Deeb O. Correlation ranking and stepwise regression procedures in principal components artificial neural networks modeling with application to predict toxic activity and human serum binding affinity, Chem Intel Lab Sys 2010; 104:181-194.

81. Hajduk PJ, Mendoza R, Petros A, et al. Ligand binding to domain 3 of human serum albumin: a chemometric analysis, J Comput Aided Mol Des 2003; 17:93-102.

127

82. Estrada E, Uriarte E, Molina E, Simon-Manco Y, Milne GWA. An integrated in silico analysis of drug binding to HSA, J Chem Inf Model 2006; 46:2709-2724.

83. Kratochwil NA, Huber W, Muller F, Kansy M, Gerber PR. Predicting plasma protein binding of drugs: a new approach, Biochem Pharmacol 2002; 64:1355-1374.

84. Yamazaki K, Kanaoka M. Computational prediction of the plasma protein binding percent of diverse pharmaceutical compounds, J Pharm Sci 2004; 93(6):1480-1494.

85. Liu J, Yang L, Pan D, Hopfinger A. Constructing plasma protein binding model based on a combination of cluster analysis and 4D-fingerprint molecular similarity analyses, Bioorg Med Chem 2006; 14:611-621.

86. Votano JR, Parham M, Hall LM, Hall LH, Kier LB, Oloff S, Tropsha A. QSAR modeling of human serum protein binding with several modeling techniques utilizing structure – information representation, J Med Chem 2006; 49(24):7169-7181.

87. Moda TL, Montanari CA, Andricopulo AD. In silico prediction of human plasma protein binding using hologram QSAR, Lett Drug Des Disc 2007; 4:502-509.

88. Ghafourian T, Amin Z. QSAR models for the prediction of plasma protein binding, BioImpacts 2013; 3(1):21-27.

89. Zhivkova Z, Doytchinova I. Quantitative structure – plasma protein binding relationships of acidic drugs. J Pharm Chem 2012; 101(12):4627-4641.

90. Zhivkova Z. Quantitative structure – pharmacokinetics relationships for plasma protein binding of basic drugs, J Pharm Pharm Sci 2017; 20:349-359.

91. Obach RS, Lombardo F, Waters NJ. Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds. Drug Metab Dispos 2008; 36:1385-1405.

92. Fleming FF, Yao L, Ravikumar PC, Funk L, Shook BC. Nitrile-containing pharmaceuticals: efficacious roles of the nitrile pharmacophore, J Med Chem 2019; 53(22):7902-7919.

93. Smith DA, Allerton C, Calgutkar AS, Waterbeemd H, Walker DK. Distribution, In: Pharmacokinetics and metabolism in drug design, 3rd ed; Wiley – VCH Verlag GMBH&Co. KGaA: Weinheim, 2012, pp. 61-79.

table of content · the progress of combinatorial chemistry and high throughput technologies has...

Documents