in-silico structure and analogue based studies on bace1 inhibitors for alzheimer’s disease
DESCRIPTION
This is work done on Alzheimer's Disease through BioinformaticsTRANSCRIPT
INDEX
Chapter No Title Page no.
1. Abstract 2
2. Aim of study 4
3. Introduction 5
3.1. Drug Designing 5
3.2. Protein 16
4. Materials and Methods 24
4.1. Structure Based Drug Design 28
4.2. De novo Ligand Design 35
4.3.Structure Based Pharmacophore Generation 40
4.4. Analogue Based Drug Design 42
5. Results and Discussion 77
5.1. Structure Based Drug Design 78
6. Analogue Based Drug Design 90
7. Conclusion 105
8. Abbreviations 106
9. References 108
1
1. ABSTRACT:
β-Secretase also called BACE1 (β-site of APP Cleaving Enzyme) or
memapsin-2. BACE1 is an aspartic-acid protease important in the pathogenesis of
Alzheimer's disease and in the formation of myelin sheaths in peripheral nerve cells. The
transmembrane protein, contains two active site aspartate residues in its extracellular
protein domain and may function as a dimer. BACE1 produces amyloid β peptide (the
primary constituent of neurofibrillary plaques, implicated in Alzheimer's disease,) by
cleavage of the amyloid precursor protein.
The potent BACE1 inhibitors have been suggested to be useful drugs. In this
QSAR, Pharmacophore and Docking studies on BACE1 inhibitors provided to be useful
to find new and potent active compounds against a neurodegenerative disorder,
Alzheimer's disease (AD). As per these studies high active compound had dock score of
82.634 when ligand fit protocol was used and the molecules formed hydrogen bond
interactions with ASP 290, GLY 96 amino acids, while low active compound showed
67.26. Using C-DOCKER protocol the molecules formed hydrogen bond interactions
with THR 293 amino acid, where high and low active compounds showed 36.36 and -
14.58 of C-Docker energy. Using Lib-Dock protocol the high active compound showed
86.88 of lib-dock score and the molecules formed hydrogen bond interaction with THR
294 amino acids, where as low active compound showed 110.87 of lib-dock score and the
molecules formed hydrogen bond interaction with ASP 290, which is same interaction of
that of crystal ligand when compared with lig-plot.
2
Novel Ligand found through the ludi formed hydrogen bond interaction with
active site amino acids GLY 96, SER 291.
Analogue based studies performed using pharmacophore generation on BACE1
inhibitors showed the important features from HipHop run as Hydrogen bond acceptor,
Hydrogen bond donor, Hydrophobic aromatic. Hypogen resulted with these features in
the training set as having cost difference of 53.41 and RMS value of 1.07. The test set
resulted with r2 value of 0.66 by plotting on the estimated activity. QSAR model
generated with training set had the r2 value of 0.972 while the test set has given the r2
value as 0.923.
3
2. AIM OF THE SUTDY
In the field of structure based drug design, there are some major goals of that
biologists seek to achieve.
To protect the proper structure of the proteins and if no X-ray
crystallography structure of the protein is available, then derive the protein
structure through homology modeling.
Given the structure of inhibitors and its target to predict correctly the
binding site on the target, the orientation of the ligand and the
conformations of the both.
Given the structure of a target macromolecules and a set of ligands is to
rank the order of the compounds in their experimental characterization.
The immediate major practical application of the above study are firstly to
improve the binding capacity of existing inhibitors and secondly to
suggest the lead compounds to locus the experimental screening effort
either by searching chemical database or by Denovo Drug Designing.
The main objective of the present study is:
I. To dock the ligand molecule (BASE-1 inhibitor) correctly on the active
site of the receptor.
II. QSAR studies to predict the structure activity relationship between the
ligand and the receptor.
III. To identify from the database and to suggest new molecules by structure
based drug designing or by analogue based drug designing.
4
3. INTRODUCTION
3.1 DRUG DESIGNING
Drug design also sometimes referred to as Rational Drug Design is the inventive
process of finding new medications based on the knowledge of the biological target. The
drug is most commonly a organic small molecule which activates or inhibits the function
of a biomolecule such as a protein which in turn results in a therapeutic benefit to the
patient. In the most basic sense, drug design involves design of small molecules that are
complementary in shape and charge to the bimolecular target to which they interact and
therefore will bind to it. Drug design frequently but not necessarily relies on computer
modeling techniques. This type of modeling often referred to as Computer Aided Drug
Design (CADD).
The phrase “Drug Design” is to some extent a misnomer. What is really meant by
drug design is ligand design. Modeling techniques for prediction of binding affinity are
reasonably successful. However, there are many other properties such as bioavailability,
metabolic half life, lack of side effects, etc. that first must be optimized before a ligand
can becomes a safe and efficacious drug. These other characteristics are often difficult to
optimize using Rational Drug Design techniques.
3.1.1 Background
Typically a drug target is a key molecule involved in a particular metabolic or
signaling pathway that is specific to a disease condition or pathology, or to the infectivity
or survival of a microbial pathogen. Some approaches attempt to inhibit the functioning
of the pathway in the diseased state by causing a key molecule to stop functioning. Drugs
may be designed that bind to the active region and inhibit this key molecule. Another
approach may be to enhance the normal pathway by promoting specific molecules in the
normal pathways that may have been affected in the diseased state. In addition, these
5
drugs should also be designed in such a way as not to affect any other important "off-
target" molecules that may be similar in appearance to the target molecule since drug
interactions with off-target molecules may lead to undesirable side effects. Sequence
homology is often used to identify such risks.
Most commonly, drugs are organic small molecules but protein based drugs (also
known as biologics) are becoming increasingly more common. In addition mRNA based
gene silencing technologies may have therapeutic applications.
3.1.2 Types
There are two major types of drug design. They are referred to as
Ligand-based drug design,
Structure-based drug design.
Ligand Based Drug Design
Ligand Based Drug Design (or Indirect Drug Design) relies on knowledge of
other molecules that bind to the biological target of interest. These other molecules may
be used to derive a pharmacophore, which defines the minimum necessary structural
characteristics a molecule must possess in order to bind to the target. In other words, a
model of the biological target may be built based on the knowledge of what binds to it
and this model in turn may be used to design new molecular entities that interact with the
target.
Structure Based Drug Design
Structure Based Drug Design (or Direct Drug Design) relies on knowledge of the
three dimensional structure of the biological target obtained through methods such as x-
ray crystallography or NMR spectroscopy. If an experimental structure of a target is not
available, it may be possible to create a homology model of the target based on the
experimental structure of a related protein. Using the structure of the biological target,
6
candidate drugs that are predicted to bind with high affinity and selectivity to the target
may be designed using interactive graphics and the intuition of a medicinal chemist.
Fig 1: Flow charts of two strategies of Structure Based Drug Design
Alternatively various automated computational procedures may be used to
suggest new drug candidates. As the experimental methods as X-ray crystallography and
NMR develop, the amount of information concerning 3D structures of biomolecular
targets has increased dramatically, as well as the structural dynamic and electronic
information about the ligands. This encourages the rapid development of the Structure
Based Drug Design. Current methods for structure-based drug design can be divided
roughly into two categories.
The first category is about “finding” ligands for a given receptor, which is usually
referred as database searching. In this case, a large number of potential ligand molecules
are screened to find those fitting the binding pocket of the receptor. This method is
usually referred as ligand-based drug design. The key advantage of database searching is
that it saves synthetic effort to obtain new lead compounds.
7
Another category of structure-based drug design methods is about “building”
ligands, which is usually referred as receptor-based drug design. In this case, ligand
molecules are built up within the constraints of the binding pocket by assembling small
pieces in a stepwise manner. These pieces can be either atoms or fragments. The key
advantage of such a method is that novel structures, not contained in any database, can be
suggested. These techniques are raising much excitement to the drug design community.
Active site identification
Active site identification is the first step in this program. It analyzes the protein to
find the binding pocket, derives key interaction sites within the binding pocket, and then
prepares the necessary data for Ligand fragment link. The basic inputs for this step are
the 3D structure of the protein and a pre-docked ligand in PDB format, as well as their
atomic properties. Both ligand and protein atoms need to be classified and their atomic
properties should be defined, basically, into four atomic types:
Hydrophobic atom: all carbons in hydrocarbon chains or in aromatic groups.
H-bond donor: Oxygen and nitrogen atoms bonded to hydrogen atom(s).
H-bond acceptor: Oxygen and sp2 or sp hybridized nitrogen atoms with lone
electron pair(s).
Polar atom: Oxygen and nitrogen atoms that are neither H-bond donor nor H-
bond acceptor; sulfur, phosphorus, halogen, metal and carbon atoms bonded to
hetero-atom(s).
The space inside the ligand binding region would be studied with virtual probe
atoms of the four types above so the chemical environment of all spots in the ligand
binding region can be known. Hence we are clear what kind of chemical fragments can
be put into their corresponding spots in the ligand binding region of the receptor.
Ligand fragment link
The term “fragment” is used here to describe the building blocks used in the
construction process. The rationale of this algorithm lies in the fact that organic structures
8
can be decomposed into basic chemical fragments. Although the diversity of organic
structures is infinite, the number of basic fragments is rather limited.
Before the first fragment, i.e. the seed, is put into the binding pocket, and add
other fragments one by one. We should think some problems. First, the possibility for the
fragment combinations is huge. A small perturbation of the previous fragment
conformation would cause great difference in the following construction process. At the
same time, in order to find the lowest binding energy on the Potential Energy Surface
(PES) between planted fragments and receptor pocket, the scoring function calculation
would be done for every step of conformation change of the fragments derived from
every type of possible fragments combination. Since this requires a large amount of
computation, one may think using other possible strategies to let the program works more
efficiently. When a ligand is inserted into the pocket site of a receptor, conformation
favor for these groups on the ligand that can bind tightly with receptor should be taken
priority. Therefore it allows us to put several seeds at the same time into the regions that
have significant interactions with the seeds and adjust their favorite conformation first,
and then connect those seeds into a continuous ligand in a manner that make the rest part
of the ligand having the lowest energy. The conformations of the pre-placed seeds
ensuring the binding affinity decide the manner that ligand would be grown. This strategy
reduces calculation burden for the fragment construction efficiently. On the other hand, it
reduces the possibility of the combination of fragments, which reduces the number of
possible ligands that can be derived from the program. These two strategies above are
well used in most structure-based drug design programs. They are described as “Grow”
and “Link”. The two strategies are always combined in order to make the construction
result more reliable.
9
Fig2: Flow chart for structure based drug design
Scoring method
Scoring functions for docking
Structure-based drug design attempts to use the structure of proteins as a basis for
designing new ligands by applying accepted principles of molecular recognition. The
basic assumption underlying structure-based drug design is that a good ligand molecule
should bind tightly to its target. Thus, one of the most important principles for designing
or obtaining potential new ligands is to predict the binding affinity of a certain ligand to
its target and use it as a criterion for selection.
10
A breakthrough work was done by Bohm to develop a general-purposed empirical
function in order to describe the binding energy. The concept of the “Master Equation”
was raised. The basic idea is that the overall binding free energy can be decomposed into
independent components which are known to be important for the binding process. Each
component reflects a certain kind of free energy alteration during the binding process
between a ligand and its target receptor. The Master Equation is the linear combination of
these components. According to Gibbs free energy equation, the relation between
dissociation equilibrium constant, Kd and the components of free energy alternation was
built.
The sub models of empirical functions differ due to the consideration of
researchers. It has long been a scientific challenge to design the sub models. Depending
on the modification of them, the empirical scoring function is improved and continuously
consummated.
3.1.3 Rational drug discovery
In contrast to traditional methods of drug discovery which rely on trial-and-error
testing of chemical substances on cultured cells or animals, and matching the apparent
effects to treatments, rational drug design begins with a hypothesis that modulation of a
specific biological target may have therapeutic value. In order for a biomolecule to be
selected as a drug target, two essential pieces of information are required. The first is
evidence that modulation of the target will have therapeutic value. This knowledge may
come from, for example, disease linkage studies that show an association between
mutations in the biological target and certain disease states. The second is that the target
is "druggable". This means that it is capable of binding to a small molecule and that its
activity can be modulated by the small molecule.
Once a suitable target has been identified, the target is normally cloned and
expressed. The expressed target is then used to establish a screening assay. In addition,
the three-dimensional structure of the target may be determined. The search for small
molecules that bind to the target is begun by screening libraries of potential drug
11
compounds. This may be done by using the screening assay (a "wet screen"). In addition,
if the structure of the target is available, a virtual screen may be performed of candidate
drugs. Ideally the candidate drug compounds should be "drug-like", that is they should
possess properties that are predicted to lead to oral bioavailability, adequate chemical and
metabolic stability, and minimal toxic effects. One way of estimating drug likeness is
Lipinski's Rule of Five. Several methods for predicting drug metabolism have been
proposed in the scientific literature, and a recent example is SPORCalc. Due to the
complexity of the drug design process, two terms of interest are still serendipity and
bounded rationality. Those challenges are caused by the large chemical space describing
potential new drugs without side-effects.
3.1.4 Computer Assisted Drug Design
Computer Assisted Drug Design uses computational chemistry to discover,
enhance, or study drugs and related biologically active molecules. The most fundamental
goal is to predict whether a given molecule will bind to a target and if so how strongly.
Molecular mechanics or molecular dynamics are most often used to predict the
conformation of the small molecule and to model conformational changes in the
biological target that may occur when the small molecule binds to it. Semi-empirical, ab
initio quantum chemistry methods, or density functional theory are often used to provide
optimized parameters for the molecular mechanics calculations and also provide an
estimate of the electronic properties (electrostatic potential, polarizability, etc.) of the
drug candidate which will influence binding affinity. Molecular mechanics methods may
also be used to provide semi-quantitative prediction of the binding affinity. Alternatively
knowledge based scoring function may be used to provide binding affinity estimates.
These methods use linear regression, machine learning, neural nets or other statistical
techniques to derive predictive binding affinity equations by fitting experimental
affinities to computationally derived interaction energies between the small molecule and
the target.
12
Ideally the computational method should be able to predict affinity before a
compound is synthesized and hence in theory only one compound needs to be
synthesized. The reality however is that present computational methods provide at best
only qualitative accurate estimates of affinity. Therefore in practice it still takes several
iterations of design, synthesis, and testing before an optimal molecule is discovered. On
the other hand, computational methods have accelerated discovery by reducing the
number of iterations required and in addition have often provided more novel small
molecule structures.
Drug design with the help of computers may be used at any of the following
stages of drug discovery:
Hit identification using virtual screening (structure- or ligand-based design)
Hit-to-lead optimization of affinity and selectivity (structure-based design, QSAR,
etc.)
Lead Optimization of other pharmaceutical properties while maintaining affinity
Fig3: Role of computer aided drug designing
13
Benefits of CADD
CADD methods and bioinformatics tools offer significant benefits for drug discovery
programs.
1. Cost Savings. The Tufts Report suggests that the cost of drug discovery and
development has reached $800 million for each drug successfully brought to
market. Many biopharmaceutical companies now use computational methods and
bioinformatics tools to reduce this cost burden. Virtual screening, lead
optimization and predictions of bioavailability and bioactivity can help guide
experimental research. Only the most promising experimental lines of inquiry can
be followed and experimental dead-ends can be avoided early based on the results
of CADD simulations.
2. Time-to-Market. The predictive power of CADD can help drug research programs
choose only the most promising drug candidates. By focusing drug research on
specific lead candidates and avoiding potential “dead-end” compounds,
biopharmaceutical companies can get drugs to market more quickly.
3. Insight. One of the non-quantifiable benefits of CADD and the use of
bioinformatics tools is the deep insight that researchers acquire about drug-
receptor interactions. Molecular models of drug compounds can reveal intricate,
atomic scale binding properties that are difficult to envision in any other way.
When we show researchers new molecular models of their putative drug
compounds, their protein targets and how the two bind together, they often come
up with new ideas on how to modify the drug compounds for improved fit. This is
an intangible benefit that can help design research programs.
CADD and bioinformatics together are a powerful combination in drug research and
development.
14
3.1.5 Software
In silico studies described in this project were carried out using the tools available
in Discovery Studio by Accelrys
Discovery studio is a complete modeling and simulations environment for life
science researchers. Discovery Studio is a single, easy-to-use, graphical interface for
powerful drug design and protein modeling research. Discovery Studio 2.5 combines
established gold-standard applications such as Catalyst, Modeler, and CHARMm that
have years of proven results and utilizes cutting-edge science to address the drug
discovery challenges of today. Discovery Studio 2.5 is built on the Pipeline Pilot open
operating platform to seamlessly integrate protein modeling, pharmacophore analysis,
virtual screening, and third-party applications. It offers
Fig 4: Features available in Discovery Studio 2.5
o Interactive, visual and integrated software.
o Consistent, contemporary user interface for added ease-of-use
o Tools for visualization, protein modeling, simulation, docking, pharmacophore analysis, QSAR and library design
o Access computational servers and tools, share data, monitor jobs, and prepare and communicate their project progress.
15
3.2 PROTEIN
3.2.1 Introduction
Classification: Hydrolase
Molecule: Beta-secretase 1
Structure Weight: 46928.46
Polymer: 1 Type: polypeptide(L)
Length: 415
Chains: A
EC#: 3.4.23.46
Fragment: UNP residues 46-454
Protein ID: 2ZDZ
Beta Secretase:
β-Secretase also called BACE1 (β-site of APP cleaving enzyme) or memapsin-2. BACE1
is an aspartic-acid protease important in the pathogenesis of Alzheimer's disease, and in
the formation of myelin sheaths in peripheral nerve cells The transmembrane protein,
contains two active site aspartate residues in its extracellular protein domain and may
function as a dimer. BACE1 produces Amyloid β (A β)peptide(the primary constituent of
neurofibrillary plaques, implicated in Alzheimer's disease,) by cleavage of the amyloid
precursor protein.
16
Fig5: Secondary structure of BACE1
Cerebral deposition of amyloid beta peptide (A-beta) is an early and critical
feature of Alzheimer's disease. A-beta generation depends on proteolytic cleavage of the
Amyloid Precursor Protein (APP) by two unknown proteases: Beta-Secretase And
Gamma-Secretase. These proteases are prime therapeutic targets. A transmembrane
aspartic protease with all the known characteristics of Beta-Secretase was cloned and
characterized. Over expression of this protease, termed BACE (for Beta-Site App-
Cleaving Enzyme) increased the amount of beta-secretase cleavage products, and these
were cleaved exactly and only at known beta-secretase positions. Antisense inhibition of
endogenous BACE messenger RNA decreased the amount of beta-secretase cleavage
products, and purified BACE protein cleaved APP-derived substrates with the same
sequence specificity as beta-secretase. Finally, the expression pattern and subcellular
localization of BACE were consistent with that expected for beta-secretase. Future
development of BACE inhibitors may prove beneficial for the treatment of Alzheimer's
disease.
Beta-Secretase (BACE) is a membrane protein that contains two necessary Asp
residues in its ectodomain (extracellular domain) which are used in the first cleavage of
the N terminal domain of the beta amyloid precursor protein to release a soluble, N-
17
terminal fragment of about 100,000 MW. g-secretase, necessary for the second cleavage
which frees the Ab peptide is a heterotetramer composed of presenillin-1, nicastrin, APH-
1 and PEN-2, and is located in neural plasma membranes and endoplasmic reticulum.
The Ab peptide moves to the extracellular side of the neural membrane where it
aggregates. The remaining cytoplasmic part of the beta-amyloid precursor protein may
regulate transcription. The presenilin subunit has protease activity. g-secretase also
cleaves another cell surface receptor protein, Notch. When this receptor has bound an
extracelluar ligand, g-secretase cleaves Notch within the cytoplasm, and the released
fragment modifies gene transcription. The APH-1 subunit appears to inhibit presenilin
protease activity while PEN-2 promotes it. Inhibiting g-secretase would be an effect
treatment for Alzheimers, but might have serious side effects since Notch processing
would also be affected.
Pathway:
The beta-secretase protein quartet, and its roles in brain development and
Alzheimer's disease. Presenilin-1, nicastrin, APH-1 and PEN-2 form a functional gamma-
secretase complex, located in the plasma membrane and endoplasmic reticulum (ER) of
neurons. The complex cleaves Notch (left) to generate a fragment (NICD) that moves to
the nucleus and regulates the expression of genes involved in brain development and adult
neuronal plasticity. The complex also helps in generating the amyloid beta-peptide
(Abeta; centre). This involves an initial cleavage of the amyloid precursor protein (APP)
by an enzyme called BACE (or beta-secretase). The gamma-secretase then liberates
Abeta, as well as an APP cytoplasmic fragment, which may move to the nucleus and
regulate gene expression. Mutations in presenilin-1 that cause early-onset Alzheimer's
disease enhance gamma-secretase activity and Abeta production, and also perturb the ER
calcium balance. Consequent neuronal degeneration may result from membrane-
associated oxidative stress, induced by aggregating forms of Abeta (which create Abeta
plaques), and by the perturbed calcium balance.
18
Figure: Cleavage of beta amyloid precursor protein: protease and cofactors
Beta Secretase Processing:
APP processing in CEMs. The amyloid protein precursor (APP) is a type I
transmembrane protein that is processed in several different pathways. Generation of the
amyloid β protein (Ab) in the β-secretase pathway (A and B) requires two proteolytic
events, a proteolytic cleavage at the amino terminus of the Ab sequence, referred to as β-
secretase cleavage and a cleavage at the carboxyl terminus, known as γ-secretase
cleavage. Cleavage by β-secretase results in the secretion of sAPPb and production of the
membrane-bound carboxyl terminal fragment β (CTFb). γ-Secretase cleavage of CTFb
produces the secreted Ab peptide and the CTF-γ. In the α-secretase pathway (C), the APP
is cleaved within Ab to generate a large, secreted derivative referred to as sAPPa and a
membrane-associated CTF-α. Ab production in the β-secretase pathway appears to occur
in CEMs that are indicated by the presence of high levels of cholesterol in the membrane
19
(light gray squares) and GM1 ganglioside (dark gray squares). It is not certain whether the
CEMs that contain β- and γ-secretase activity are contiguous (A) or spatially distinct (B).
Local production of Ab in CEMs (A or B) could result in local aggregation due to the high
concentrations of Ab and the fibril promoting factors present in CEMs. In non-CEM
membranes, the α-secretase pathway is favored (C).
Two proteases produce Ab from the amyloid β protein precursor (APP) through
sequential cleavages (reviewed in ref.11). APP is first cleaved by β-secretase (BACE1,
Asp2, memapsin1), a transmembrane aspartyl protease, at the amino terminus of Ab to
generate a large, secreted derivative (sAPPb) and a membrane-bound APP carboxyl
terminal fragment (CTFb). Subsequent cleavage of CTF-β by γ-secretase results in
production of the Ab peptide and CTF-γ. In a second pathway, APP is cleaved within the
Ab sequence by α-secretase, which generates another large, secreted derivative and CTF
(sAPPa and CTFa).
Recent evidence indicates that the first cleavage step in Ab generation (Fig. 1), β-
secretase cleavage, may occur in CEMs. β-Secretase is enriched in CEMs that are distinct
from caveolar containing CEMs.12 Although β-secretase activity was not measured, the
concentration of mature β-secretase in these membranes provides initial evidence that this
cleavage may occur at this site. This localization would also be consistent with the
observation that lowering cholesterol reduces β-secretase cleavage, described in detail
below. In addition, there is evidence that alterations in caveolin-3 expression can alter β-
secretase cleavage of APP.13 How this relates to the presence of β-secretase in non-
caveolar CEMs is not clear.
20
Disease:
Alzheimer's disease (AD) also called Alzheimer disease, Senile Dementia of the
Alzheimer Type (SDAT) or simply Alzheimer's, is the most common form of dementia.
This incurable, degenerative, and terminal disease was first described by German
psychiatrist and neuropathologist Alois Alzheimer in 1906 and was named after him.
21
Generally it is diagnosed in people over 65 years of age, although the less-prevalent
early-onset Alzheimer's can occur much earlier. An estimated 26.6 million people
worldwide had Alzheimer's in 2006; this number may quadruple by 2050.
Although the course of Alzheimer's disease is unique for every individual, there
are many common symptoms. The earliest observable symptoms are often mistakenly
thought to be 'age-related' concerns, or manifestations of stress. In the early stages, the
most commonly recognised symptom is memory loss, such as difficulty in remembering
recently learned facts.
As the disease advances, symptoms include confusion, irritability and aggression,
mood swings, language breakdown, long-term memory loss, and the general withdrawal
of the sufferer as their senses decline. Gradually, bodily functions are lost, ultimately
leading to death.
Biochemistry:
Alzheimer's disease has been identified as a protein misfolding disease
(proteopathy), caused by accumulation of abnormally folded A-beta and tau proteins in
the brain. Plaques are made up of small peptides, 39–43 amino acids in length, called
beta-amyloid (also written as A-beta or Aβ). Beta-amyloid is a fragment from a larger
protein called amyloid precursor protein (APP), a transmembrane protein that penetrates
through the neuron's membrane. APP is critical to neuron growth, survival and post-
injury repair. In Alzheimer's disease, an unknown process causes APP to be divided into
smaller fragments by enzymes through proteolysis. One of these fragments gives rise to
fibrils of beta-amyloid, which form clumps that deposit outside neurons in dense
formations known as senile plaques.
22
Enzymes act on the APP (amyloid precursor protein) and cut it into fragments. The beta-
amyloid fragment is crucial in the formation of senile plaques in AD.
.
23
4. MATERIALS AND METHODS:
In the last few years the role of computational methods in both pharmaceutical
and academic research has developed dramatically. The emphasis being placed on high
throughput methods in the pharmaceutical industry, which has increased the number of
compounds in the discovery pipeline. Characterizing the position and orientation of small
molecules bound to a protein surface can be an important step in drug design.
Computational methods developed rapidly as groups seek high throughput, low cost
approaches in accelerating the drug discovery process. Such approaches will be necessary
as scientists attempt to characterize the large number of drugs currently being generated.
Structural information of biological macro molecules and their importance with ligand is
increasingly being used in modern medicinal chemistry. There is a pressing used for
novel computational methods that can evaluate the structural information about ligand
receptor complexes in a more quantitative way , both to improve existing leads and to
design de novo compounds with accurately predicted binding affinities. The following
experimental methods categorically divided into three parts:
Structure based drug designing
Docking studies
a) Ligand Fit
b) CDOCKER
c) Lib Dock
1. Structure based pharmacophore generation
2. Ludi
Analogue based drug designing
1. Common feature pharmacophore generation (HipHop)
2. 3D pharmacophore generation (HypoGen)
3. Quantitative structure activity relationships (QSAR)
24
Preparation of Molecular System
Macromolecule (protein 2ZDZ) Preparation:
Load the protein and apply the force field
For this QSAR, pharmacophore and docking studies, the protein 2ZDZ is
loaded from RCSB protein data bank (www.rcsb.org/pdb/) and force field is applied.
Force field refers to the functional form parameter sets which are used to find out
potential energy of a system. It includes parameter which is obtained through
experimental works and quantum mechanics calculations. All molecules in a mechanical
system are made up of a number of components. Covalently bonded atoms takes into
consideration several parameters such as bond length , bond angle , dihedral angles etc.,
similarly there exists non-bonded interactions such as Van der Waals interactions ,
electrostatic interactions. Thus the total potential energy of the system is calculated as
follows
E1= [E bond + E angle + E torsion + E vanderwaals + E electronic ]This summation when given is an explicit form, represents force field, evaluating the
potential of a system.
Minimization :
The Minimizer uses an algorithm to identify the geometrics of the
molecule corresponding to the minimum points on the potential surface energy. The
Minimizer reduces the unwanted forces which are present in the molecule and lowers the
energy level of the molecule. There are many algorithms available in the minimization
process. Some of the minimization methods used in the Smart Minimizer is Steepest
Descent method, Conjugate Gradient method, Newton Raphson method and quasi
Newton method. From the DS protocols select the Minimization option and run the
protocol for the protein with fixed constraints .Then save the minimized protein for
further studies.
25
Fig 10: Minimized 2ZDZ
Fig11: Representation of important amino acids
26
Important amino acids were identified as
GLY96,ASP290,ASP94,THR293,SER291. Based on the ligplot information and theory
from the below stated articles.
Preparation of bio active molecules:
65 bioactive compounds with the activity range 0.078 uM to >118 uM were collected
from the following four journals:
Acylguanidine inhibitors of Beta-secretase:Optimizatioin of the pyrrole ring
substituents extending in to the S1 substrate binding pocket Bioorganic &
Medicinal Chemistry Letters 18 (2008) 767-771.LeeD.Jennings, Derek C.Cole,
Joseph R.Stock, MOhani N. Sukhdeo, John W.Ellingboe, Rebecca Cowling,
Guixizn Jin, Eric S. Manas, Kristi Y. Fan, Michael S.Malamas, Boyd L. Harrison,
Steve Jacobsen, Rajiv Chopra, Peter A. Lohse, William J. Moore, Mary-Margaret
o’Donnell, Yun Hu, Albert J.Robichaud,M.James Turner, Erik Wagner and
Jonathan Bard.
Design and synthesis of potent Beta-secretase (BACE-1) inhibitors with P1
caroxylic acid bioisosteres. Bioorganic & Medicinal Chemistry Letters 16 (2006)
2380-2386. Tooru Kimura, Yoshio Hamada, Monika Stochaj, Hayato ikari, Ayaka
Nagamine, Hamdy Abdel-Rahman, Naoto Igawa, Koushi Hidaka, Jeffrey-Tri
Nguyen, Kazuki Saito, Yoshio Hayashi and Yoshiaki Kiso.
Novel non-peptide beta-secretase inhibitors derived from structure based virtual
screening and bioassay.Bioorganic & Medicinal Chemistry Letters 19 (2009)
3188-3192.Weijun Xu, Gang Chen, Oi Wah Liew, Zhili Zuo, Hualiang Jiang,
Weiliang Zhu.
Design, Synthesis and biological evaluation of novel dual inhibitors of
acetylcholinesterase and beta-secretase. Bioorganic & Medicinal Chemistry
Letters 17 (2009)1600-1613.Yiping Zhu, Kun Xiao,Lanping Ma, Bin Xiong, Yan
Fu,Haiping Yu,Wei Wang,Xin Wang, Dingyu Hu, Hongli Peng,Jingya Li,Qi
Gong, Qian Chai, Xican Tnag,Haiyan Zhang, Jia Li,JingKang Shen.
27
Procedure:
1. A basic scaffold of the molecules was sketched using the sketching tools
available in DS. Modifications were made to the scaffold to make sketches of
all the 65 molecules which were saved as files with .mol extensions.
2. Sketched molecules are typed with CHARMm force field.
3. The typed molecules are subjected to the energy minimization using Smart
Minimizer which minimizes a series of ligand poses using CHARMm.
4. Minimized molecules are saved with .sd and .mol extension for further study.
5.1. Structure or Target Based Drug Design
Structure Based Drug Design, the three dimensional structure of drug target
interacting with small molecules (drug) is used to guide drug discovery. Drug targets are
typically key molecules involved in a specific metabolic or cell signaling pathway that is
known, or believed, to be related to a particular disease state. Drug targets are most often
proteins and enzymes in these pathways. Drug compounds are designed to inhibit, restore
or otherwise modify the structure and behavior of disease-related proteins and enzymes.
SBDD uses the known 3D geometrical shape or structure of proteins to assist in
the development of new drug compounds. The 3D structure of protein targets is most
often derived from x-ray crystallography or nuclear magnetic resonance (NMR)
techniques as they have the resolution few angstroms (about 500,000 times smaller than
the diameter of a human hair). At this level of resolution, researchers can precisely
examine the interactions between atoms in protein targets and atoms in potential drug
compounds that bind to the proteins. This ability to work at high resolution with both
proteins and drug compounds makes SBDD as one of the most powerful methods in drug
design
28
Once bound at the receptor site, drugs may act either to initiate a response (agonist
action or stimulant) or decrease the activity potential of that receptor (antagonist action or
Depressant) by blocking access to it by active molecules. Thus, any drug may have
structural features that contribute independently to the affinity for the receptor and to the
efficiency with which the drug receptor combination initiates the response (intrinsic
activity or efficiency). The response is related to the drug receptor complexes. The
affinity of a drug may be estimated by comparison of the dose required to produce a
pharmacological response with the dose required by a reference standard drug or the
natural ligand for that receptor. The affinity of a drug may be estimated by comparison of
the dose required to produce a pharmacological response with the dose required by a
reference standard drug or the natural ligand for that receptor. Structure based drug
design, the three dimensional structure of drug target interacting with small molecules
(drug) is used to guide drug discovery. Structure based drug designing is employed with
the following parts:-
1. Structure based pharmacophore generation
2. Ludi
Molecular Docking
In the field of molecular modeling, docking is a method which predicts the
preferred orientation of one molecule to a second when bound to each other to form a
stable complex. Knowledge of the preferred orientation in turn may be used to predict the
strength of association or binding affinity between two molecules using for example
scoring functions. Molecular docking may be defined as an optimization problem, which
would describe the “best-fit” orientation of a ligand that binds to a particular protein of
interest. Docking is useful for predicting both the strength and type of signal produced.
The focus of molecular docking is to computationally stimulate the molecular
recognition process. The aim of molecular docking is to achieve an optimized
29
conformation for both the protein and ligand and relative orientation between protein and
ligand such that the free energy of the overall system is minimized.
Docking is frequently used to predict the binding orientation of small molecule
drug candidates to their protein targets in order to in turn predict the affinity and activity
of the small molecule. Hence docking plays an important role in the rational design of
drugs. Given the biological and pharmaceutical significance of molecular docking,
considerable efforts have been directed towards improving the methods used to predict
docking.
Two approaches popular docking approaches exist. The conformational search
approach uses a matching technique that describes the protein and the ligand as
complementary surfaces. The second approach using Scoring methods simulates the
actual docking process in which the ligand-protein pair wise interaction energies are
calculated. These are of 3 types: Force field based, Empirical based and Knowledge
based methods. Both approaches have significant advantages as well as some limitations.
Scoring is the process of evaluating a particular pose (candidate binding mode) by
counting the number of favorable intermolecular interactions such as hydrogen bonds and
hydrophobic contacts.
There are several docking methods which are used to dock ligands in different
docking algorithms. Each method has its own advantages and disadvantages. Two
docking methods available in Discovery Studio by Accelrys and used in the present study
are Ligand fit and CDOCKER. These are summarized below.
30
Fig 12: Docking work flow
4.1. i. Ligand Fit
LigandFit is a shape-based method for accurately docking ligands into protein
active sites. The method employs a cavity detection algorithm for detecting invaginations
in the protein as candidate active site regions. A shape comparison filter is combined with
a Monte Carlo conformational search for generating ligand poses consistent with the
active site shape. Candidate poses are minimized in the context of the active site using a
grid-based method for evaluating protein-ligand interaction energies. Errors arising from
grid interpolation are dramatically reduced using a new non-linear interpolation scheme.
Protein shape:
Sites are defined based on the shape of the protein. An “eraser” algorithm is used
to clean all the grid points outside the protein. The boundary between inside and outside
is determined by defining the opening size parameter. Within the boundary a flood filling
algorithm is employed to search unoccupied grid points which form the cavities (sites).
All sites detected can be browsed according to their size, and a user defined size cut-off
eliminates sites smaller than the specified size.
31
Dock ligand:
Sites are defined based on a docked ligand. If there is a docked ligand the
unoccupied grid points within a certain user definable distance to ligand atoms are
collected to form the site. The site can be edited (enlarged, contracted and deleted), saved
and later restored for further studies.
Ligand fit is designed to search the binding site of a protein and dock a series of
potential ligands into the binding site. During docking the protein is rigid, in which the
ligand remains flexible allowing the conformations to be searched and docked with in the
binding site. The three dimensional structure of protein and ligand are required. There are
three key steps in this process.
a. Site search
The position and shape binding site of protein is defined to a grid. The active site
shape is defined based on the shape of the protein, from which all sites are detected.
Docked ligand method is used to define active site, in which unoccupied grid points
within a certain user definable distance to ligand atoms are collected to form the site.
b. Conformational search
The Monte Carlo simulation is employed in the conformational search of the ligand.
During the search, bond lengths and bond angles are untouched only torsional angles
(except those in a ring) are randomized. Therefore, the ligand molecules should be energy
minimized to ensure correct bond lengths and bond angles before using ligand fit.
c. Ligand fitting
After a new conformer is generated, the ligand fitting is carried out in two steps.
First the non mass- weighted principle moment of inertia (PMI) of the binding site is
compared with non mass- weighted principle moment of inertia (PMI) of the ligand. If
the value (Fit value) is above the threshold or not better fitting results previously saved, no
further docking process will be performed. If the value (Fit value) is better than
previously saved results the ligand is positioned into the binding site according to the
PMI. Because PMI is a scalar property, there are four possible positions for the ligand to
32
orient in the binding site. For each position, the corresponding docking score is
computed.
The docking score is negative value of the non-bonded inter molecular energy
between ligand and protein. After the docking score is calculated, for each orientation it
is compared with the results saved previously. If the new one is better, it is saved, and
then the process of conformational search and ligand fitting is iterated until number of
trials is reached. Finally rigid body minimization is applied to the saved conformations of
the ligand to optimize their positions and docking scores.
Procedure
Steps followed for Ligand Fit:
1. Potent inhibitor molecules which can inhibit the action of BASE-1 were taken.
2. Molecules with diversified similarities and pharmacophore features were selected from
the literature.
3. The molecules which are to be docked in a receptor site are saved into a .sd file so that
all molecules are processed for the docking score at a site.
4. The active site of a protein is identified by the find site from receptor cavities which is
processed by the flood flow algorithm.
5. The identification of the active site is located by the already docked ligand
6. The protein molecule is selected, the set of molecules in the .sd file are chosen and
docking score is calculated.
7. Thus, the docking score for a set of molecules are calculated through Ligand Fit.
4.1 ii CDOCKER
Docking of ligands to a receptor consists of 2 phases. The first phase is simply
the positioning of ligand in the binding site. This phase is typically referred to as finding
the poses for the ligand. The second phase involves the evaluation of individual poses
also known as scoring. It is imperative that the true hits and poses be distinguished from
incorrect ones. It can be said that the scoring of the poses is the most important phase.
Generally these methods include empirical scoring functions, knowledge based potentials
33
and force field derived methods. Many force field based methods are based on the
following simple relationship:
E binding = E complex – (E receptor + E ligand)
The binding energy is the left over after removing the internal energy of the
individual components (the receptor and the ligand).
CDOCKER is a molecular dynamics (MD) simulated-annealing-based docking
method that employs CHARMm. CDOCKER (CHARMm-based DOCKER), is a grid-
based MD docking algorithm, which offers all the advantages of full ligand flexibility
(including bonds, angles, dihedrals), the CHARMm19 family of force fields, the
flexibility of the CHARMm engine, and reasonable computation times.
It has been employed in Discovery Studio through the Dock ligands
(CDOCKER). In CDOCKER the receptor is held rigid while the ligands are allowed to
flex during the refinement. Random ligand conformations are generated from the initial
ligand structure through high temperature molecular dynamics followed by random
rotations. To adequately explore the conformation space, many different optimization
methods and search strategies have been developed, including distance-geometry, Monte
Carlo (MC) simulated-annealing, genetic algorithms (GAs), and molecular
dynamics.The random conformations are refined by grid based simulated annealing and a
final grid based or full force field minimization. Soft-core potentials are found to be
effective in exploring the conformational space of small organics and macromolecules
and are being used in various applications, including docking and the prediction of
protein loop conformations. During the docking process, the non bonded interactions
(including Vander Waals (vdW) and electrostatics) are softened at different levels, but
this softening is removed for the final minimization.
CDOCKER is especially useful for very flexible ligands having more than 30
rotatable bonds.
Details of the CDOCKER Docking Protocol
In the standard protocol, 50 replicas for each ligand are generated and randomly
distributed around the center of the active site. The internal coordinates for each of the
34
replicas are kept the same as those originally generated from CORINA (used to generate
2D representation of the ligand molecules). The MD simulated annealing process is
performed using a rigid protein and flexible ligand. The ligand-protein interactions are
computed from either GRID I, GRID II, or the full force field. A final minimization step
is applied to each of the ligand’s docking poses. The minimization consists of 50 steps of
steepest descent followed by up to 200 steps of conjugate-gradient using an energy
tolerance of 0.001 kcal mol-1. These minimized docking poses are then clustered based on
a heavy atom RMSD approach using a 1.5 Å tolerance. The final ranking of the ligand’s
docking poses is based on the total docking energy (including the intra molecular energy
for ligands and the ligand-protein interactions). A ligand-protein docking is considered a
success if the RMSD between the top ranking (lowest energy) docking pose and the
ligand’s X-ray position is less than 2.0 Å. The docking accuracy is then computed as the
percentage of successfully docked ligands from a test set.
CDOCKER steps:
1. Define the receptor and search for binding sites,
2. Prepare and run the dock ligands (CDOCKER) protocol,
Procedure:
1. Open the receptor protein and apply the charmM force field
2. Define the selected molecule as a receptor after that select the ligand define
sphere from selection
3. Open the CDOCKER protocol and set the parameters
4. Run the protocol
4.2. DENOVO LIGAND DESIGN
De Novo ligand design identifies potential novel ligands by screening a library of
small molecules to find those that are complementary to a target receptor.
Complementarity is defined as an appropriate spatial orientation of hydrogen bonding
35
and hydrophobic function groups. Molecules that cannot be fitted without incurring Van
der Waals clashes or electrostatic repulsions are screened out during the search process.
Ludi
De novo methods use the Ludi algorithm which works in 3 steps:
1. Interaction sites within a defined search sphere inside the target receptor are
calculated. Typically the search sphere definition is based on the location of a set
of known ligands which bind within receptor cavity.
2. Ludi formatted library are searched for fragments which can fit inside the sphere
while forming favorable bond interactions with the interactions sites.
3. An alignment of linking for the fragment is proposed.
To generate the interaction sites Ludi uses a set of rules that are intended to cover
the complete range of energetically favorable orientation for H bonds and
hydrophobic contacts. Fitting fragments into the interaction sites and subsequent
alignment (linking) of fragments to a partially build ligand is controlled by several
options
Steps and application of parameters which are used in hypothesis generation
1. Import the molecules in view compound work bench and clean the constructed
molecules.
2. Apply Catalyst force field, and then do the 3D minimization.
Conformation search
The aim of the conformation search is to obtain the diversified
conformations .Conformations generation methods are classified into two types. One is
best method and the other is fast method. Both the methods emphasize broad coverage to
cover the conformational space. Fast conformer generation is used to cover the
conformational space of molecules. It uses systematic or random search depending on the
36
size of the molecules. Systematic search is useful for small molecules and random search
is used for macromolecules. In the case of macro molecules the conformers are
minimized by poling algorithm.
Conformational analysis stops when one of three conditions is met:
After maximum number of conformers have generated.
Energy of the newly generated conformer is too high to the predefined energy rest
hold.
If there is no possible new conformer generation after certain number of trials.
Ligand design
The design of new ligand for protein (enzyme inhibiter) for protein is carried out
if the structure is known. If the structure of one or more protein – inhibitor complex is
known ,the design may be added by study that identifies essential ligand - protein
interaction .there are two approach to find a compound can fit into active site
The known structure approach:
Searching through database such a Cambridge structure database identifies
structures that fit the active site. The advantage of this approach is that the molecules
retrieved from the database do exist and their structure represents low energy
conformation. This approach does not address the issue of conformation flexibility.
The fragment approach:
This approach use a library of fragment the idea is to position molecular fragment
into the active site, in such a way that hydrogen bond can be formed with the protein and
hydrophobic pockets filled with hydrophobic groups. The fragment is than connected by
suitable a pacer fragment to form single molecules.
Ludi can also suggest modification of known ligand that may enhance its activity against
the target protein. The following Chart shows the Ludi work flow.
37
Fig 13: Ludi work flow
Ludi method
Ludi is based on fragment approach method. It suggests how suitable and small
fragments can be positioned into cleft of protein structures. This positioning is the
strength Ludi because it immediately provides with the ideas about how putative binding
site on the protein can be saturated by the fragment and those fragments might be linked
together .Ludi works in three steps:
1. It calculates interaction site within the protein active site or from the active
angles.
2. It searches libraries for fragments and fits than onto the interaction sites.
3. To process an alignment or linked for the fragment.
Ludi distinguishes four types of interaction sites.
1. H-donor
2. H acceptor
3. Lipophilic aliphatic
4. Lipophlic aromatic
38
The aromatic and aliphatic interactions are suitable sites for hydrophobic interactions
The H donor and H acceptor interaction sites are suitable for H bond formation. Ludi is
capable for fitting fragments on to the interaction sites and simultaneously a linking (i.e
linking) them to an existing ligand.
Method:
1. Identification of chemical nature of active site amino acids
2. Fragments identification and analysis of Ludi score
3. Searching for link
4. Linking the fragments
5. Fusing the fragment and linking
6. Docking validation.
Fragment fitting
The next step is to fit fragments onto the interaction sites. Ludi searches the list of
interaction sites by distance criteria for suitable sets of two to sites to match the
fragments. Required interaction are specified are specified using targeted mode. In
targeted mode fragments are require to interact with the protein atom or atoms specified
by the user. Any fragment fit that does not interact with the entire set of specified target
atoms is rejected.
To fit the fragment, Ludi performs a root mean squares (RMS) superimposition
using algorithm given by Kabasch (1978). A fragment fit is accepted if the RMS value is
less than a user defined threshold (typically 0.2A to 0.6A) , and no vanderwaals overlap
of the fitted fragment with the protein occurs, and if the electrostatic check parameter on
the Ludi runtimes parameters control panel is checked , no unacceptable electrostatic
repulsions are found. When the receptor structure is not known, a fragment fit is rejected
if the fragment extends outside the volume defined by the set of active analogs.
Link sites: Aligning fragments with partially built ligands
Ludi is capable of fitting fragments onto the interaction sites and simultaneously
aligning (i.e. linking) them to an existing to a ligand. For this purpose, link sites are
39
defined on the ligand. A link site is a hydrogen atom that all the hydrogen atoms of the
positioned ligand (within a user specified cutoff radius) are link sites.
Ludi fragment libraries
The Ludi fragment library is divided into two parts. The de novo library is used when
Ludi is run in no-link mode. The link library is used when Ludi is run in link mode. The
de novo library and the link library each consist of two files, a file that specifies the
fragment topologies and a file that specifies the interaction types of fragment functional
groups.
Procedure
1. It calculates interaction sites within the protein 1SNU active site or from the
active analogs.
2. It searches libraries for fragments and fits them from onto the five interaction sites
which are present at the active site.
3. It proposes an alignment or linking for the fragments and the new ligand is
designed.
The highest activity with the best dock score is better fitted when
compared to other. A knowledge based approach is to suggest possible binding positions.
The present experimental studies carried out using Ludi program. This program is studied
to dock small molecular fragments within protein binding sites using interactions
between the donor hydrogen and its acceptor is close to 1.8Å and the angle subtended at
the hydrogen is rarely less than 1.20o. Information about the preferred geometries of such
interactions can be obtained from analysis of X ray crystallographic database. Kelbe has
performed a very careful analysis of non bonded contacts observed in the CSD.
4.3. STRUCTURE BASED PHARMACOPHORE GENERATION
Structure based pharmacophore approach was performed to find out the essential
feature of active site which can contribute for ligand binding.
The interaction generation protocol takes an input receptor and a defined active site
and analyzes the active site for donors, acceptors, and hydrophobes. The result of the
40
calculation is an interaction map. The density of polar site parameter specifies the density
of the vectors in the interaction site for hydrogen bonds. The density of lipophilic sites
parameter specifies the density of points in the interaction site for lipophilic atoms.
Procedure:
1. Load the interaction generation protocol from the protocols explorer. The
parameters display in the parameter explorer
2. Ensure that the structure you want to define as the receptor is open in 3D window.
Use the binding site tool panel to define the structure as the receptor.
3. Set the input site sphere parameter to define the active site. Select the ligand from
the receptor ligand complex and define the input site sphere
4. The radius of the site sphere can change by selecting the sphere and changing the
radius in the attributes dialog.
5. Select the receptor structure from the input receptor parameter list.
6. Select the sphere as the input site sphere parameter
7. Set the remaining parameter as desired and run the protocol.
Lib Dock:
Lib Dock uses protein site features referred to as Hot Spots. HotSpots consist of
two types: polar and apolar. Apolar Hotspots is preferred by a polar ligand atom and a
apolar Hotspot is preferred by an apolar atom. The receptor HotSpot file is calculated
prior to the docking procedure. However, If desired, a pre-defined or user adjusted
HotSpot file can be used. The protocol allows the user to specify several modes for
generating ligand conformations for docking. If an input ligand file consist of ligand
conformations, the conformer generation can be turned off.
The rigid ligand poses are placed in to the active site and Hotspots are matched as
triplets. The poses are pruned and a final optimization step is performed before the poses
are scored. Ligand hyfrogens, which are removed during the docking process are added
41
to the ligand poses. These hydrogens are not optimized, so they may require further
optimization to ensure that receptor-ligand hydrogen bonds are formed correctly.
MCSS:
A new method is proposed for determining energetically favorable positions and
orientations for functional groups on the surface of proteins with known three-
dimensional structure. From 1,000 to 5,000 copies of a functional group are randomly
placed in the site and subjected to simultaneous energy minimization and/or quenched
molecular dynamics. The resulting functionality maps of a protein receptor site, which
can take account of its flexibility, can be used for the analysis of protein ligand
interactions and rational drug design. Application of the method to the sialic acid binding
site of the influenza coat protein, hemagglutinin, yields functional group minima that
correspond with those of the ligand in a cocrystal structure.
The multiple copy simultaneous search (MCSS) method is utilized to search for
optimal positions and orientations of a set of functional groups. For peptide ligands,
functional groups corresponding to the protein main chain (N-methylacetamide) and to
protein side chains (e.g., methanol, ethyl guanidinium) are used. The resulting N-
methylacetamide minima are connected to form hexapeptide main chains with a simple
pseudoenergy function that permits a complete search of all possible ways of connecting
the minima. Side chains are added to the main-chain candidates by application of the
same pseudoenergy function to the appropriate functional group minima.
4.4. ANALOGUE BASED DRUG DESIGN
Analogue Based Drug Design refers to the application of the knowledge of the
ligand structure and their activity when very less or no information is available about the
3D structure of the target to design a drug. It is required to design the binding site based
on the known structure of the ligands.
42
4.4. i. Pharmacophore generation
“A pharmacophore is an ensemble of steric and electronic features that is
necessary to ensure the optimal supra molecular interactions with a specific biological
target and to trigger (or block) its biological response.” Perceiving a pharmacophore is
the most important first step towards understanding the interaction between a receptor
and ligand. In the early 1900’s Paul Ehrlich offered the first definition for a
pharmacophore. A pharmacophore was first defined by Paul Ehrlich in 1909 as "a
molecular framework that carries (phoros) the essential features responsible for a drug’s
(=pharmacon's) biological activity" .
Catalyst provides the tools for selecting potential ligand compounds prior to
synthesis. The aim of this software is to reduce the time and cost of screening, synthesis
and biological testing. It accelerates the drug discovery process by identifying lead
candidates faster.
Pharmacophore or hypothesis describes the generalized molecular features
involved in the binding of ligand to activate site of proteins molecular features including
1D which represents the physical and biological properties, 2D represents the sub
structures and 3D represents the chemical features such as acceptors, donors, positive,
negative, ionizable, hydrophobic (aromatic & aliphatic) and ring compounds features. In
Catalyst each hypothesis can be defined in four parts. The first one is chemical features,
second is location and orientation in 3D dimensional space, third is tolerance and fourth
is weight. Weight represents the relative importance of each chemical function in
conferring activity
A pharmacophore model or hypothesis consists of a three-dimensional
configuration of chemical functions surrounded by tolerance spheres. A tolerance sphere
defines that area in space that should be occupied by a specific type of chemical
functionality. Pharmacophore models are routinely used in lead identification and
optimization in the areas of library focusing, evaluation and prioritization of virtual high
throughput screening (VHTS) results, de novo design, and scaffold hopping.
Pharmacophore models can be constructed using analog-based (using known active
ligands) or receptor-based techniques (using receptor active site information). In the
43
absence of crystallographic structure data of a protein for which the active site for
receptor binding is clearly identified, a chemist must rely on the structure activity data for
a given set of ligands. If these ligands are known to bind to the same receptor, then one
can attempt to define the commonality between them. Accelrys Catalyst program can
generate two types of automated pharmacophore models, Hypo Gen and HipHop,
depending on whether or not activity data is used. In the presence of protein crystal
structure data, active site pharmacophore models can be used as a pre-filter for docking
large libraries. Generation of a pharmacophore model using the active site residue
information is the key to the success of any pharmacophore-based docking algorithm. In
the absence of X-Ray bound ligand information; it is a challenge to select a single
pharmacophore model that represents the binding characteristics. A methodology is
proposed in this case study that can be used to analyze and visualize multiple
pharmacophore models. This methodology can be applied to different types of Catalyst
pharmacophore models (qualitative, quantitative, receptor-based, etc.) as it only considers
feature types and coordinates.
This methodology can be applied successfully to the following applications:
VHTS screening
Multiple binding mode identification
Classification of proteins based on binding characteristics
Visualization of pharmacophore model space
To build a better pharmacophore, the following steps were employed:
1. Building a set of molecules
2. Conformer generation
3. Hypothesis Generation
4. Database Search
5. Compare/Fit to estimate Activity
The Feature Dictionary list contains the generalized chemical functions in Catalyst.
44
Definitions of these functions are:
1. HB ACCEPTOR (vector): Matches the following types of atoms or groups of atoms
with surface accessibility-
sp or sp2 nitrogen’s that have a lone pair and charge less than or equal to zero
sp3 oxygen’s or sulfurs that have a lone pair and charge less than or equal to zero
non-basic amines that have a lone pair
Does not match: basic, primary, secondary, and tertiary amines that are protonated at
physiological pH. There is no exclusion of electron-deficient pyridines and imidazoles.
2. HB ACCEPTOR lipid (vector): Matches these types of atoms or groups of atoms:
nitrogen’s, oxygens, or sulfurs (except hypervalent) that have a lone pair and charge less
than or equal to zero. This function is the same as HB ACCEPTOR except that it includes
basic nitrogen. There is no exclusion of electron-deficient pyridines and imidazoles.
3. HB DONOR (vector): Matches these types of atoms or groups of atoms:
Non-acidic hydroxyls
Thiols
Acetylenic hydrogens
NHs (except tetrazoles and trifluoromethyl sulfonamide hydrogens)
Does not match: electron-rich pyridines and imidazoles that would be protonated or
nitrogen’s that would be protonated due to their high basicity
4. HYDROPHOBIC (point): Matches these types of groups of atoms:
A contiguous set of atoms that is not adjacent to any concentrations of charge (charged
atoms or electronegative atoms) in a conformer such that the atoms have surface
accessibility such as phenyl, cycloalkyl, isopropyl, and methyl.
5. HYDROPHOBIC ALIPHATIC (point): Matches these types of groups of atoms:
A contiguous set of atoms that are not adjacent to any concentrations of charge (charged
atoms or electronegative atoms) in a conformer such that the atoms have surface
accessibility is cycloalkyl, isopropyl, and methyl
6. HYDROPHOBIC AROMATIC (point): Matches these types of groups of atoms:
45
A contiguous set of atoms that is not adjacent to any concentrations of charge (charged
atoms or electronegative atoms) in a conformer such that the atoms have surface
accessibility such as phenyl and indole.
7. NEG CHARGE (atom): Matches negative charges not adjacent to a positive charge.
8. NEG IONIZABLE (point): Matches atoms or groups of atoms that are likely to be
deprotonated at physiological pH, such as:
Trifluoromethyl sulfonamide hydrogens
Sulfonic acids (centroid of the three oxygens)
Phosphoric acids (centroid of the three oxygen’s)
Sulfinic, carboxylic, or phosphinic acids (centroid of the two oxygen’s)
Tetrazoles
Negative charges not adjacent to a positive charge
9. POS CHARGE (atom): Matches positive charges not adjacent to a negative charge.
10. POS IONIZABLE (point): Matches atoms or groups of atoms that are likely to be
protonated at physiological pH, such as:
Basic amines
Basic secondary amidines (iminyl nitrogen)
Basic primary amidines, except guanidine’s (centroid of the two nitrogen’s)
Basic guanidine’s (centroid of the three nitrogen’s)
Positive charges adjacent to a negative charge do not match weakly basic aromatic
nitrogen’s such as pyridine and imidazole.
11. RING AROMATIC (vector and plane): Matches 5- and 6-membered aromatic
rings. The feature defines 2 points, the ring centroid and a projected point normal to the
ring plane. The projected point can map both above and below the ring.
Steps to be followed in DS:
1. Construct or import the molecules.
2. Perform conformational search
46
3. Examine the each conformer for the presence of chemical features.
4. Determine the set of features that correlate with activity
Pharmacophore hypothesis
Catalyst’s Confirm Common Feature Pharmacophore generation (HipHop) and
3D QSAR generation (HypoGen) are applications that provide tools to generate
pharmacophore hypothesis. The hypotheses are created by generating conformation for a
set of study molecules, then using the conformation to find and align chemically
important functional groups common to the molecules in the study set. Chemically
important functional groups common to the molecules in the study set. Each hypothesis
can also incorporate data on the biological activities of the study molecules.
Steps involved generating a pharmacophore hypothesis:
1. Generate conformations
The interface to confirm is used to generate conformations for a single molecule or
a set of molecules. The number of conformation needed to produce a good representation
of a compound conformational space depends on the molecules. Both conformations
generating algorithms available in Confirm (Best and Fast) are adjusted to produce a
diverse set of conformations, avoiding repetition groups of conformations all representing
local minima.
The conformations all representing local minima.
The conformations generated by Confirm can be used as input into HipHop and
HypoGen to align common molecular features and generate a hypothesis.
Align common features to generate a hypothesis.
The following procedure involves
1. Aligning common molecular features.
2. Setting preferences using control panel
3. Incorporating activity data into a hypothesis
47
4. Using aligned structures to generate receptor models.
HipHop and HypoGen use conformations generated in Confirm to align
chemically important functional groups common in the molecules in the study set. A
pharmacophore hypothesis can then be generated from these aligned structures.
Incorporated biological activity data into a hypothesis
The HipHop is also used to incorporate biological activity data into the hypothesis
generating process. Each hypothesis is tested by regression techniques to compare
estimated activity with actual activity data. The software uses the data from these tests to
select the hypothesis that do the best job predicting activity for the set of study molecules.
This capability is provided by Catalyst / HypoGen.
4.4 ia Common feature pharmacophore generation (HipHop)
Pharmacophore based on multiple common features alignment generate receptor
models using Hip Hop. The objective is to identify and enumerate all possible
pharmacophore configurations that are common to the training set. The aligned structures
the model receptor menu card is included in the hypothesis models card deck so that you
can use structures that have been aligned in HipHop to generate a receptor surface model.
Since structures used in HipHop are aligned by common chemical features, the receptor
surface model that is generated for them can be significantly different from a receptor
surface model generated from template aligned structures.
The ideal HipHop training set are as follows:-
2-30 compounds ideally 6 molecules
Structurally diverse set of input molecules.
Feature rich compounds
Include the most active compounds
Spread sheet set up for HipHop
48
Molecules hypothesis generation work bench imported into a spread sheet
principal specific the reference molecules references configuration models are potential
centres for hypothesis
If (0) do not consider these molecules
If (1) consider configuration of the molecules.
If(2) use this compound as a reference molecules used only for HipHop
hypothesis generation
Maximum omit features: shows how many features for each compound may be omitted
If (0) all features must map to generate hypothesis
If (1) all but one feature must map to generate hypothesis
If(2) features need to map to generate hypothesis used only for HipHop
hypothesis generation.
When compound data appear in the spreadsheet, you are ready to add values in the
Principal and MaxOmitFeat columns. Common feature hypothesis generation uses
values in these columns to determine which molecules should be considered when
building hypothesis space and which molecules should map to all or some of the
features in the final hypotheses.
In the Principal column, a value of 2 means that all the chemical features in the
compound will be considered in building hypothesis space. A value of 1 means that
features will be considered when generating hypotheses and that at least one mapping for
each generated hypothesis will be found unless the Misses or Complete Misses options
are used. A value of 0 means the compound will be ignored.
The MaxOmitFeat column specifies how many hypothesis features must map to
the chemical features in each compound a 0 in this column forces mapping of all features,
a 1 means that all but one feature must map, and a 2 allows hypotheses to which no
compound features map
4.4.ii 3 D QSAR Pharmacophore generation (HypoGen)
49
HypoGen attempts to derive SAR models for a set of molecules for which activity
value (IC50 or Ki) on a given biological target are available. HypoGen optimizes
hypothesis that are present in the highly active compounds in the training set. But missing
among the least active (or inactive) ones. It attempts to construct the simplest hypothesis
that best correlates that activity (estimates vs. measured) the predicted models are created
the predicted models are created in three stages:
Constructive
Subtractive
Optimization
Fig14: HypoGen process flow
50
Pharmacophore domain
Feasible models
Top scoring models
Constructive phase
Subtractive phase
Optimization phase
1. Constructive Phase:
The constructive phase identifies hypotheses those are common to the most active
set of compounds. The process flow of this phase is depicted below:
Fig 15: Constructive phase process flow
2. Subtractive Phase: The objective of this phase is to identify those pharmacophore
configurations that are developed in the constructive phase that are also present in
the least active set of molecules and remove them. The process is depicted as
follows:
51
Training setMost active compounds
Identify the most active compounds
Enumerate all possible pharmacophore configurations.
Check for duplicates.
Ensure that the rest of most actives fit to MinSubsetPoint features.
Pharmacophore Domain
2nd most active
The most active
(Most Active Cmpd x Unc)-(CmpdX/Unc)>0
Identify the least active compounds
Enumerate all possible pharmacophore configurations.
Check for configurations shared with the most active compounds.
Eliminate if shared by more than half of the least actives.Feasible pharmacophores
2nd most active
The most active
log(CmpdX)-log(Most Active Cmpd)>3.5
Training set
Least active compounds
Fig 16: Subtractive phase process flow
3. Optimization Phase:
This phase involves improvement of hypotheses score. HypoGen reports
the top scoring 10 unique pharmacophores. The process flow is depicted as
follows:
Fig 17: Optimization phase process flow
The constructive phase identifies hypothesis that are common to the most active
set of compounds.
The most active set is determined by the following equation:
MA x UncA = (A/UncA)>0.0
Where MA is the activity of the most active compounds
Uncert is the uncertainty in the measured activity and A is the activity of the compound
The most active set of compounds is limited to a maximum of 8. Once the set is
determined HypoGen enumerates all possible pharmacophore features for each of the
52
Feasible pharmacophores
Features and /or locations are varied to optimize activity prediction via stimulated annealing approach.
Geometric fits are calculated.
Linear regression of –log(Activity) vs Geometric Fit performed.
Total cost is calculated for each new hypothesisTotal cost = [Cost(Err)xCC(Err)]+[Cost(Wt)xCC(Wt)]+[Cost(Cnfg)xCC(Cnfg)]Where CCs are the cost coefficients contained in CATALYST_CONF/hypo.data
Stops when the optimization no longer improves the score.
“Occam’s Razor”: the simplest hypothesis that accurately estimates the activity is considered the best
conformations for the two most active compounds. Furthermore, the hypothesis must fit a
minimum subset of features of the remaining most active compounds in order to be
considered. At the end of the constructive phase a database of every number of
pharmacophore configurations is generated. The objective of the subtractive phase is to
identify those pharmacophore configurations is generated. The objective of the
subtractive phase is to identify that pharmacophore configuration developed in the
constructive phase that is also present in the least active set of molecules and remove
them. The first step is the identification of the least active compounds. This is
accomplished by the equation
Log (A) - log (MA) < 3.5
Where the A is the activity of the current compound and MA is the activity of the most
active compound.
In simple terms, all compounds whose activity is 3.5 order of magnitude less than
that of the most active compound are considered to be in the set of least active molecules.
The value 3.5 is user adjustable parameter, if needed (i.e., if the activity range of the
dataset does not span more than 3.5 orders of magnitude the subtractive phase identifies
the hypothesis that are common to the least active compounds the least active set is
determined by the following equation:
log (cmpdx)-log (most active compounds) > 3.5
It enumerates all possible pharmacophore configurations. Then it checks for
configuration with the most active compounds and eliminates if shared by more than half
of the least actives leading to feasible pharmacophore.
The optimization phase involves improvement of the hypothesis score.
Small perturbations are applied to those pharmacophore configurations that survived the
subtractive phase and that are scored based on errors I activity estimates from regression
and complexity of the hypothesis. The cost of a hypothesis is a quantitative extension of
Occams razor (everything else being equal, the simplest model is preferred;
A detail of the cost of each pharmacophore is computed by the sum of three costs:
weight, error and configuration. While the weight component increases with deviation of
the feature weight from the ideal value of 2.0, the error component increases with RMS
53
difference between the measured and estimated activities. The configuration cost is fixed
and depends on the complexity of the pharmacophore upon completion of this phase.
HipHop and HypoGen use conformations generated in Confirm to align
chemically important functional groups common to the molecules in a study set.
Biological activity data can be incorporated into this hypothesis so that the best
hypothesis for predicting activity are generated and selected. Additionally, you can use
structures that have been aligned in these programs to generate a receptor surface model.
HypoGen Training and Test set selection
Selection of the training set molecules is one of the most important exercises the
user must purpose for the following reasons:
Catalyst derives the information used in subsequent analysis from those structures
thus; “the garbage in garbage out” paradigm certainly applies.
The statistical procedures applied during analysis have limits in terms of over and
under fitting the data.
Data sets that are ideal for those analysis procedures and data sets from typical
medicinal chemistry structure activity series are often not the same thing.
The ideal training set should satisfy the following conditions:
1. At least 16 compounds are necessary to assure statistical power.
2. Activities should span 4 orders of magnitude.
3. Each order of magnitude should be represented by at least 3 compounds.
4. No redundant information.
5. No excluded volume problems.
Methodology
To build a better pharmacophore the following steps were employed
1. Building set of molecules
2. Conformer generation
3. Hypothesis generation
4. Database generation
54
5. Database search
6. Compare / fit to estimate activity
Criteria to generate successful hypothesis are:
1. Cost factor: a dumping score that is the difference between fixed and null cost
should be greater than so hits i.e., larger difference gives better prediction.
2. Fixed cost represents the simplest method model that fits all data perfectly and the
null cost represents the highest cost of a pharmacophore with no features and
which estimates activity to be average of activity data of training set of molecules.
3. The configuration value which is a measure of magnitude of hypothesis space for
a given training set should be less than 18. If it is above, more degree s of
freedom and the result may not be useful.
4. The estimated and the actual activity data correlation value should be around 1.0
5. The RMS deviations, which should be as low as possible, nearly equal to 0, which
represents the quality of the correlation between the estimated and the actual
activity data.
Method
1. Building a set of molecules
All molecules were built using Catalyst view compound work bench. They were
cleaned using option 2D beautify and minimized using CHARMm like force field.
2. Conformer generation
A conformer is a representation model of the possible conformational space of a
ligand. It is assumed that the biologically active conformation of a ligand (or a close
approximation thereof should be contained within this model. Conformers were
generated for all molecules with cut off energy range 20 Kcal /mol and up to a maximum
of 255 conformers.
Cost hypothesis:
55
The lowest cost hypothesis is considered to be the best. However, hypothesis with
costs within 10-15 of the lowest cost hypothesis are also considered as good candidates.
The units of cost are binary bits. Hypothesis costs are calculated according to the number
of bits required to completely describe a hypothesis. Simplex hypothesis require bits for a
complete description and the assumption is made that simplex hypothesis are better.
Hypothesis generation / pharmacophore search
A pharmacophore model consists of a collection of features necessary for the
biological activity of the ligand arranged in 3D space, the common ones being hydrogen
bond acceptor, hydrogen bond donor and hydrophobic features. Hydrogen bond donors
are defined as vectors from the donor atom of the ligand to the corresponding acceptor
atom in the receptor. Hydrogen bond acceptors are analogously defined. Hydrophobic
features are located at the centroids of hydrophobic atoms.
Conformation s for all molecular were generated in view compound work bench
using poling algorithm and the best quality conformer generation method. The best
conformer generation considers the arrangement of atoms. Best conformer generation
accepts a maximum of 255 conformers for the set of molecules Catalyst generated
conformers that provided the most comprehensive treatment of flexible ring systems. All
the conformers are automatically saved and the number of conformers generated for each
molecule with lowest conformer energy in kcal/mol. Conformers were selected that fell
within 20 kcal/mol range above the lowest energy conformation found.
Hypothesis generation
The pharmacophore hypothesis generated in generate hypothesis work bench. The
molecular were selected as training set based on order of magnitude. Hypothesis
generation carried out by employing following assumptions.
1. Highly active and most inactive molecule should represent in the training set.
2. At least 3 or more molecules from each order of magnitude should be selected for
pharmacophore generation.
3. A minimum of 15 or above molecules will constitute for a training set.
56
4. Molecules selected should represent diversity towards chemical features.
Hypothesis considerations
In order to achieve a better pharmacophore, the following limits or considerations
should be met by generated hypothesis:
Configuration value should be around 17.
RMS should be as low as possible, preferable nearer to zero.
Correlation should be around 1.0
Cost factor difference between fixed cost and Null cost should be between 40-80
bits.
Factors that determine the quality of pharmacophore
The overall cost of a hypothesis is calculated by summing three cost factors, a
weight cost, an error cost and a configuration cost. These are qualitatively defined.
1. Weight cost
A value that increases in a Gaussian form as the feature weight in model
deviates from an idealized value of 2.0. This cost factor is designed to favour hypothesis
where the feature Weights are close to 2.
2. Error cost
A value that increases at the RMS difference between estimated and measured
activities for the training set molecules increases. This cost factor is designed to favour
models where the correlation between estimated and measured activities is better.
3. Configuration cost
This is a fixed cost which depends on the complexity of the hypothesis space
being optimized. It is equal to the entropy of the hypothesis space.
57
Of the three, the error cost factor has the major effect in establishing hypothesis
cost. During the beginning phase of an automated hypothesis generation, Catalyst
calculates the cost of two theoretical hypothesis one in which the error cost is minimal
(all compounds fall along a line of slope=10, and one where the error cost is high (all
compounds fall along a line of slope +O). These models can be considered upper and
lower bounds for the training set. The cost values for them are useful guides for
estimating the chances for a successful experiment and are available within 15 minutes
from the start of the run because these experiments can easily require days of run time.
The ideal hypothesis cost (fixed cost) is reported in the full file found in the hypothesis
generation directory. This value tends to be 70-100 bits. The null hypothesis cost is
reported in the log file found in the same directory and is usually higher than the fixed
cost. What is important is the difference between these two costs. The greater the
difference, the higher is the probability for finding useful model. In terms of hypothesis
significance, what really matters is the magnitude of the difference the cost of any
returned hypothesis and the cost of the null hypothesis. In general, if this difference is
greater than 60 bits, there is an excellent chances the model represents a true correlation.
Since, most returned hypothesis will be higher in cost than fixed cost model, a difference
between fixed cost and null cost of 70 or more will be necessary in order to achieve the
60 bit difference. If a returned hypothesis has a cost that differs from the null hypothesis
by 40-60 bits, there is a high probability it has a 75-90% chances of representing a true
correlation in the data. As the difference becomes less than 40 bits, likelihood of the
hypothesis representing a true correlation in the data rapidly drops below 50%%. Under
these conditions, it may be difficult to find a model that can be shown to be predictive. In
the extreme situation where the fixed and null cost differential is small (>20), there is
little chance of succeeding and it is advisable to reconsider the training set before
proceeding. Another useful number is the entropy of hypothesis space. This value is
calculated early in the run and is in full near the value for fixed cost.
Training set
1. Training set should contain the most active compounds.
2. Each compound must provide a unique feature to Catalyst.
58
3. If two compounds have similar structures (collections of features), they must
differ in activity by an order of magnitude to be included, otherwise, pick only the
more active of the two.
4. If two compounds have similar activities (within one order of magnitude), they
must be structurally distinct (from a chemical feature point of view) in order to
both be included, otherwise pick only the most active of the two.
The pharmacophore features are perceived from the HipHop data. The
features present in training set molecules are hydrogen bond acceptor, hydrogen bond
donor, hydrophobic and ring aromatic. 19 molecules are selected for the training set. The
training set molecules and their activity values are loaded into a spread sheet and all the
preferences and uncertainty values are loaded. Then the HypoGen algorithm is used to
generate the hypotheses.
4.4. iii Quantitative Structure Activity Relationship (QSAR)
The idea of quantitative structure-activity (or structure-property) relationships
(QSAR/QSPR) was introduced by Hansch et al. in 1963 and was first applied to analyze
the importance of lipophilicity for biological potency. This concept is based on the
assumption that the difference in the structural properties of molecules, whether
experimentally measured or computed, accounts for the difference in their observed
biological or chemical properties. In general QSAR methods deals with identifying and
describing important structural features of molecules that are relevant to explaining
variation in biological or chemical properties. QSAR started as a simple comparison of
properties for two or more molecules using single number and has ended up as a complex
multivariable treatment of properties versus structure based on statistical analysis and
relying on extraordinary power of modern computers.
QSAR is a technique that quantifies the relationship between structure and
biological data and useful for optimizing the groups those modulate the potency of a
molecule .QSAR has been the useful for rationalizing compound activity and for rational
design of new compounds.
59
Most QSAR methods developed over the years have been dealt with descriptors of
molecular structures derived from 2D representation of molecular structures .i.e., based
on molecular connectivity. Numerous 2D structural descriptors have been reported,
including hydrophobicity constants, molar refractivities, Hammett electronic constants,
Verloop STERIMOL parameters, and topological indices developed by Kier and Hall.
Traditional QSAR methods have utilized several of the above parameters and multiple
regression methods to develop equation relating structure and biological activity
The fundamental quantitative structure activity relationship studies reveals that the
structures can be easily be compared, overlaid and displayed. The QSAR is obtained by
providing more parameters to optimize a series of bioactive molecules. The quantitative
structure activity relationship based on physiochemical properties describes the structural,
electronic and physiochemical characteristics of a drug. Data sets are produced using all
available descriptors.
Application of knowledge of the three-dimensional (3D) structure of the target
(receptor/enzyme/DNA) to rationally design drug molecules to bind to the target is done
for the following reasons are:-
1. Understand atomic details of binding strength and specificity of a drug (drug-receptor
interactions).
2. Develop novel drugs (unique chemical structures) for a selected target via de novo
drug design or database searching techniques.
3. Optimize the therapeutic index of an already available drug or lead compound
concerning structural requirements for activity from a minimum number of compounds
are tested.
A QSAR equation numerically defines the chemical properties, biological activity and
physiochemical properties. Biological activity is defined as pharmacological response
usually expressed in millions such as the effective dose in 50% of the subjects (ED 50).
The lethal dose is 50% of the subjects (LD50) or the minimum inhibitory concentration
IC50. It is common to express the biological activity as a reciprocal QSAR equation is
similar to the equation for a straight line:-
y = mx + c
or
60
Log biological activity = a (physiochemical property) + c
a = regression coefficient of slope of the straight line.
c = intercept on y-axis (when the physiochemical property equals zero)
Fig 18: Concept of QSAR
Biological activity expressed as a reciprocal to produce a positive slope and
also due to the inverse relationship between physiochemical chemical property and
biological potency. There is a positive relationship between the reciprocal of the
biological activity(I/BA) and physiochemical property, because (I/BA) increases as the
studies are based on the descriptors and biological activity relationship the biological
activity data must be minimal and the choice of the descriptors of the descriptors must be
accurate and appropriate.
Objective of QSAR:
1. Drug transport/ mechanism
2. Prediction of activity.
3. Classification of molecules as highly active, moderately active and inactive.
4. Optimization of activity by steric, electrostatic and hydrophobicity
61
5. Refinement of synthetic targets.
6. Reduction and replacement of animals for the action of drugs
Basic requirement in QSAR studies:
1. All analogues should belong to congeneric series.
2. All analogues should exert same mechanisms of actions.
3. All analogue should bind in a comparable manner.
4. Effect of isosteric replacement can be predicted.
5. Binding affinity can be correlated to interaction energies.
6. Biological activities can be correlated to binding activity.
QSAR studies involve the following steps
CSD data base.
Choice of descriptors.
Statistical methods to evaluate to evolve QSAR equation.
Validation.
CSD database
Experimental information about the structures of molecules can often be
extremely useful for forming theories of conformational analysis and hoping to predict
the structures of molecules for which no experimental information is available. The most
important technique currently available for determining the three dimensional structure of
molecules is x-ray crystallography community has distributed in electronic form two
practically important databases for molecular modeler are the Cambridge structural
database CSD which contains crystal structures of organic and organ metallic molecules
and the protein data bank (PDB) which contain structures of proteins and some DNA
fragments.
62
A data base of little use without software tools to search extract and manipulate the
data. A simple use of a database is for extracting information about a particular molecule
or group of molecules .the data may also be identified by creating a two dimensional
representation of molecule and using a substructure search program to search the
database. Crystallographic database have also been used to develop an understanding of
the factors that influence the conformations of the molecules, and of the ways in which
molecules interact with each other. For example, the CSD has comprehensively analyzed
to characterize how the lengths of chemical bonded depend upon the atomic numbers,
hybridization and the environment of the atoms involved. Analyzing of intermolecular
hydrogen bonding have revealed distinct distance and angular preferences a major use of
the CSD is substructure searching for molecules which contain a particular fragment, in
order to investigate the conformation that the fragment adopts.
A crystallographic database can only provide information about the crystal state
of matter and that the possible influence of crystal packing forces should always be taken
into account. This is less of concern for protein than for small molecules as protein
crystals contain a large amount of water and indeed NMR studies are established that
protein have approximately, the same structure in solution as in the crystal.
A second, more stable subtle, bias is that crystallographic databases only contain
molecules that can be crystallized and indeed only those molecules whose X-ray
structures were considered enough to be published. The structures in a crystallographic
database may therefore not be a wholly representative set.
Molecular descriptors
The study of steric requirements for interaction between ligands and
corresponding biological acceptor sites is often of decisive importance in understanding
the role played by the structural features in promoting activity in its most general form
drug receptor theory requires that a ligand exerts its biological action as a consequence of
binding or otherwise interacting with a specific biological acceptor site such as
membrane protein , an enzyme etc., which may be generally termed the receptor the
concept is the basis for modern drug receptor theory involves the old principle that a
ligand fits its receptor much as a key fits a lock. This concept, although somewhat
63
arbitrary since a high degree of flexibility is present in biomacromolecules, structure,
governs the principle of molecular recognition and molecular discrimination. Although
stereochemistry often plays a major role in drug bioactive, care must be taken when
considering structure activity relationship to explore whether other differences in
physiochemical properties exists before one makes significant correlations with the steric
properties of the structure under study.
In early studies organic chemists defined a number of steric parameters in
order to explain steric effects of substituents on the reaction centers of organic molecules.
The same type of steric effects observe in studies of variation of physical properties and
the chemical reactivity with structure may be assumed to be involved in biological
activity studies which at least as a first approximation may be treated in similar fashion in
the past 35 years owing to the development of drug design and Hansch Approach many
other parameters and methods have been developed which have the permit of trying to
avoid a simple empirical correlation with given ligand properties and also trying to
propose the possible geometric features of the receptor.
Steric descriptors are classified into following groups:
1. Topological indices based on characterization of the chemical structures of the graph
theory.
2. Geometric descriptors resulting from the view of organic molecules as three
dimensional objects from which standard dimensions can be calculated.
3. Chemical descriptors derived from steric influence upon a standard reaction.
4. Physical descriptors derived when an organic molecule is considered as three
dimensional object with size determined physical properties and different descriptors
which result when an organic molecule is considered as a three dimensional object from
reference structure.
Different molecular descriptors available are described below.
Molecular Descriptors
1. Fragment constant descriptors
64
These are constants that relate the effect of substituents on a “reaction center”
from one type of process to another. The basic idea is that similar changes in
structure are likely to produce similar changes in reactivity, ionization or
binding. There are different constants corresponding to different effects. These
are typically used to parameterize the Hammett equation for some series of
analogs.
Log kx= pσ +log kh
Where Kx and kh are reaction rate constants for the substituents x and h,
respectively ;0 is an electronic constant by an ionization constant and p is fit to
set etc at different properties (electronic , steric )etc at different R group
positions are used . In this way measurements of ionization constants can be
used to predict rate constants once a sealing factor (p) is determined effects for
the rate of constant. The default database currently contains the following types
of constants. These come from table VI –I of Hansch except for the Sterimol
constant which is calculated.
Sm, Sp - Electronic effect sigma meta and sigma para
F, R - Inductive polar part (F) and resonance part (R)
pi – Hydrophobic character
HA, HB – Hydrogen bond acceptor (HA) and donor (HB)
MR - Molar refractivity = (n2-1/n2+1)*(MW/d)
[n -refractive index, MW -molecular weight and d -compound density]
Sterimol-L – Steric length parameter
Sterimol-B1 through B4 – Steric distances perpendicular to bond axis
Sterimol-BS – Overall maximum steric distance perpendicular to bond
axis
2. Conformational descriptors
65
Energy – Descriptor energy is the energy of the selected conformation
Low Energy – Energy of the most stable conformation in the set of
conformations belonging to each molecular model
E penalty – Difference between Energy and Low Energy
3. Electronic descriptors
Charge – Sum of partial charges
F charge – Sum of formal charges
A pol – Sum of atomic polarizabilities
Dipole – Dipole moment
HOMO – Highest occupied molecular orbital energy
LUMO – Lowest unoccupied molecular orbital energy
Sr – Super delocalizability
4. Graph theoretic descriptors
All these descriptors ultimately base their calculation on representation of
molecular structures as graphs, where atoms are represented by vertices and
covalent chemical bonds by edges. These descriptors fall into 2 categories:
a.) Topological descriptors: These view molecule graphs as connectivity
structures to which numerical invariants can be assigned. There are 20
descriptors based on graph theory concept. They help to differentiate
molecules according mostly to their size, degree of branching, flexibility and
overall shape. Examples are Weiner’s index, Zagreb Index, Hosoya index,
Kier and Hall molecular connectivity index and Balaban indices.
66
b.) Information content descriptors: These view molecule graphs as source of
certain probability distribution to which Shannon’s statistical information
theory tool can be applied. In this approach molecules are viewed as
structures which can be partitioned into subsets of elements that are in some
sense equivalent. The notion of equivalence depends on the particular
descriptor.
All of these descriptors perform their evaluations on Hydrogen suppressed
graphs, i.e, there are no vertices corresponding to hydrogens and no edges
corresponding to bonds connecting hydrogen to another atom.
5. Molecular Shape Analysis (MSA) descriptor
DIFFV – Difference volume
Fo – Common overlap volume (ratio)
NCOSV - Non common overlap steric volume
Shape RMS – RMS to shape reference
COSV – Common overlap steric volume
SRVol – Volume of shape reference compound
6. Spatial descriptors
RadofGyration – Radius of gyration
Jurs descriptors – Jurs charged partial surface area descriptors
Shadow indices – Surface area projections
Area – Molecular surface area
Density – Density
67
PMI – Principle Moment of Inertia
Vm – Molecular volume
7. Structural descriptors
MW – Molecular weight
Rotlbonds – Number of rotatable bonds
Hbond acceptors – Number of Hydrogen bond acceptors
Hbond donor - Number of Hydrogen bond donors
8. Thermodynamic descriptors
AlogP – Log of partition coefficient
Fh2o – Desolvation free energy of water
Foct - Desolvation free energy for octanol
Hf – Heat of formation
MolRef – Molar refractivity
9. Molecular Field Analysis (MFA) descriptors:
Molecular field analysis (MFA) evaluates the energy between a probe and
molecular model at a series of points defined by a rectangular or spherical grid. This
method quantifies the interaction energy between a probe molecule and a set of aligned
target molecules in QSAR. This energy may be added to the study table to form new
columns headed according to the probe type. The new columns may be used as
independent X variables in the generation of QSAR.
Six descriptors are available in this family.
68
1. H+ probe: This selects proton “as a probe’, having +1 charge and zero vanderwaals
radius. It has electrostatic interactions and non bonded interaction are not
considered
2. CH3 probe: This probe with a vanderwaals radius of united CH3 group but with a
zero charge. The energy of interaction of this probe with a study molecule will
include only non bonded interactions.
3. Donor / acceptor probe: It is two atom probes consisting of oxygen bounded to
hydrogen. The vanderwaals radii of eth atoms are exactly how they are defined in
the particular force field loaded. The probe is neutral. Depending on the
orientation of this probe. It is capable of bleaching as a hydrogen bond donor or
an acceptor.
4. CH3 probe: It is single atom probe with a vanderwaals radius of a united CH3 of -
1. The energy of interaction of this probe includes both non-bonded of interaction
of this probe includes both non bounded and electrostatic interactions.
5. Generic probe: There is a generic single atom probe with a user specified Vander
radius and charge.
6. Other probes: Any multi atom model may be employed as a probe specifying the
Msi file format.
Statistical methods to evaluate QSAR equation
QSAR analysis uses statistical methods for studying the correlation of biological
activity to structural and physiochemical properties of candidate molecules. Here are
different statistical techniques used to fit the molecule under multivariate statistics, which
include the following:-
1. PCA (Principal Component Analysis):
It aims at representing large amount of multidimensional data by
transforming them into a more intuitive low dimensional representation. This
69
method does not create a model, but searches for relationship among the
independent variables. It then creates new variables (the principal components)
which represent most of the information contained in the independent variables.
2. Cluster Analysis:
The goal of cluster analysis is to partition (typically to representing set of
models in a molecular descriptor property space) into classes or categories
consisting of elements of comparable similarity. The algorithm assumes that
models are represented by points in multidimensional property space with
Euclidian distance between points representing model dissimilarity. The below
mentioned are the types in this category
1. Jarivs – Patrick clustering
2. Variable-Length Jarnis Patrick clustering
3. Relocation Clustering
4. Hierarchical Clustering Analysis (HCA)
3. Simple Linear Regression:
It performs a standard linear regression calculation to generate a set of
QSAR equations that includes one equation for each independent variable. It is
good for exploring simple relations between structure and activity.
4. Multiple Linear Regressions (MLR):
This method calculates QSAR equation by performing standard multi
variable regression calculations using multiple variables in a single equation. In this
method variables are independent correlated).
5. Stepwise Multiple Linear Regression:
It calculates QSAR equation s by adding one variable data time and
testing each addition for significance and such variables are sued in QSAR
70
equation. It is useful when the number of variables is large and when the key
descriptors are not known. If the number of variables exceeds number of structures
this method should not be used.
6. PLS (Partial Least Squares):
This method carries out regression using latent variables. From the
independent and dependent data that are along their axes of greatest variation and
are most highly correlated. It can be used with more than one dependent variable.
It is typically applied when the independent variables are correlated or the number
of independent variables exceeds the number of observations (rows).
7. GFA (Genetic Function Approximation):
GFA is designed to be applied to the problems of function
approximation. When it receives a large number of potential factors influencing a
response including several powers and other functions of the raw inputs, it should
find the subsets of terms that correlate best with the response.
The central concepts of GFA are simple. The region to be searched is coded into
one or more strings. In the GFA these strings are sets of terms: power and splines
of the raw input. Each string represents a location in the search space.The
algorithm works with a set of these strings called a population. This population is
evolved in manner that leads it towards the objective of research. This requires
that a measure of the fitness of each string corresponding to a model in the GFA is
available.
Following this three operations are performed iteratively in succession: selection,
crossover, mutation. Newly added members are screened according to fitness
criteria. In GFA the scoring criteria for models are related to the quality of the
regression fit to the data. The selection probabilities must be revaluated each time
when a new member is added to the population.
1. Selection: Two parents are selected from the present population with
probabilities proportional to their fitness.
71
2. Crossover: A crossover splices and rejoins the characters in the two parent
string to create a new child string. In conventional genetic algorithm this is
accomplished by selecting the crossover point along each of the parents and
combining the first substring from the first parent from the second substring with
the second parent.
Parents: Child:
X 12 , X 2 | 3 X 4, X 33 X 12 , X 2 , X 4 , X 52
X 1 , X3 | X 4, X 52
3. Mutations: In a mutation, the single term in a string (a model) is altered.
This is the mechanism for continuously introducing a measure of diversity into
the population acting to prevent the algorithm from getting stuck with in a
suboptimal of solutions.
In the GFA algorithm simulations are performed with the user defined probability
after each crossover. The GFA procedure continues for a specified number of
generations unless convergence occurs in the interim. Generation is the number of
attempted a crossover equal to the size of population. Convergence is triggered by
lack of progress in the highest and average score of populations.
8. GPLS: (Genetic Partial Least Squares):
It is a method derived from GFA and PLS that are valuable analytical
tools for datasets that have more descriptors than samples. The following three
statistical methods are useful in combi chem. and analog builder.
9. FA (Factor Analysis):
It addresses one of the main problems found in PCA that is not simple to
relate the principal component to molecular properties. All the common factors
have a close relationship to real molecular properties.
72
10. RP (Recursive Partition):
It identifies the internal representation of classes used by classification
structure activity relations hip (CSAR) for deriving recursive portioning models.
Validation Methods
Once a regression equation is obtained it is important to determine its
reliability and its significance. Internal validation uses the data set for which the model is
derived and checks for internal consistency. The procedure derives a new model and is
used to predict the activities of the molecules that were not included in the new model
set. This is repeated until all compounds have been deleted and predicted once. Internal
validation is less rigorous than external validation. External validation evaluates how well
the equation generalization. The original data are divided into two groups, the training set
and the test set. The training set is used to derive a model, and the model is used to
predict the activities of the test set numbers. The following procedures are used to check
that the size of the model is appropriate for the quantity of data availability as well as
provides some estimate of how well the model can predict activity for new models are as
follows:-
1. Cross Validation: This process repeats the regression may times on subsets of the data.
Usually each molecule is left out intern and r2 is computed using the predicted values of
the missing molecules (r2)
2. Randomization Test: Even with large number of observations and a small number of
terms, an equation can still have a very poor predictive power. This can come about it the
observation are not sufficiently independent of each other.
Interpreting QSAR equation
QSAR is used for predicting the activities of as yet untested and possibly not yet
synthesized) molecules. The predictive ability of a QSAR is generally more accurate for
73
interpolative (for compound that have parameters within the range of those considered in
the data set) than for the extrapolative predictions (compounds that are outside the range)
A QSAR equation provides insights into the mechanism of the process being studies.
1. Square Of Correlation Coefficient (r2): If x (independent) and y (dependent) variables
are highly correlated, there is considerable information in x and y that is redundant. The
degree of correlation is measured by the correlation coefficient (r2)
2. Cross Validated r2 (Termed As Q2 or Xvr2): r2can be computed using cross validation
methods (XVr2) or boot strap methods (BSr2). It is also the fraction of the variance
explained by the model. Cross validated r2 is always somewhat lower and often much
lower than the r2.
3. PRESS (Predictive Error Sum Of Squares): The sum of overall compares of the
squared differences between the actual and the predicted values for independent variables
[1/y]2. The intensity of the cross validated process is controlled by selecting the number
of groups or number of times the cross validation step is to be carried out while
predicting all rows (at each stage of model development).
Procedure
74
Fig 19: Flowchart of QSAR procedure
Calculate molecular properties
The Calculate Molecular Properties protocol will calculate many properties or
perform basic statistical and correlation analysis of the numeric properties as requested.
To set up a Calculate Molecular Properties protocol:
1. Load the QSAR and apply the force field on molecules and Calculate
Molecular Properties protocol from the Protocols Explorer. The parameters
display in the Parameters Explorer.
2. On the Parameters Explorer, click in the cell for the Input Ligands parameter
and click the button to specify the ligand source on the Specify Ligands dialog.
On the dialog, select all ligands from a Table Browser, a 3D Window, or a file.
3. Select the properties to calculate by clicking the button in a cell for the
Molecular Properties, Semi empirical QM descriptors, or Density Functional QM
descriptors, and follow the instructions in the popup dialog window.
75
The Create Genetic Function Approximation can build a Genetic Function
Approximation model for a dependent property using the selected molecular descriptors.
To set up a Create Genetic Function Approximation Model protocol:
1. Load the QSAR /Create genetic function approximation Model protocol from
the Protocols Explorer. The parameters display in the Parameters Explorer.
2. On the Parameters Explorer, click in the cell for the Input Ligands parameter
and click the button to specify the ligand source on the Specify Ligands dialog.
On the dialog, select all ligands from a Table Browser, a 3D Window, or a file.
3. Set the desired model name using the Model Name parameter. Once created,
this model will appear under the other category of the Molecular Properties
parameter in the Calculate Molecular Properties protocol and can be used to
compute the property for future ligands.
4. Set the initial equation length and remaining parameters as desired. Parameters
presented in red are required.
76
77
5.1. LIGAND FIT
The docking score is the negative values of the non-bonded inter molecular energy, if the
ligand atom has partial charge on it, the electrostatic grid is used to estimate electrostatic
energy. If it is a hydrogen atom, the hydrogen grid is used for Vander Waals energy.
Fig1: This figure is showing the binding site of the protein, which is defined for the
ligand fit.
78
Fig2: Molecule scafold4 molecule1 (high active) which has been subjected to ligand
fit is showing its interaction with amino acids of 2ZDZ.
Fig3: Molecule 2 (low active) which has been subjected to ligand fit is showing its
interaction with amino acids of 2ZDZ.
79
Table showing top 10 Dock scores of high active molecule
Index Name DOCK_SCORE(HA)1 Scafold4 molecule1 82.6342 Scafold4 molecule1 80.0593 Scafold4 molecule1 78.7324 Scafold4 molecule1 75.6295 Scafold4 molecule1 75.2596 Scafold4 molecule1 74.997 Scafold4 molecule1 72.3268 Scafold4 molecule1 72.1679 Scafold4 molecule1 72.01210 Scafold4 molecule1 71.776
Table showing top 10 Dock scores of low active molecule
Index Name DOCK_SCORE(LA)1 Molecule 2 67.262 Molecule 2 67.1333 Molecule 2 66.6834 Molecule 2 66.015 Molecule 2 65.6346 Molecule 2 65.0957 Molecule 2 64.6658 Molecule 2 64.459 Molecule 2 64.27210 Molecule 2 64.267
CONCLUSION:
The docking score of the above stated molecules are all positive values. Thus the
molecules can be used as the potential ligands for the inhibition of betasecretase.
80
CDOCKER:
Uses CHARMm based molecular dynamics to dock ligands into a receptor
binding site. Random ligand conformations are generated using high temperature
molecular dynamics. The conformations are then translated into binding site. Candidate
poses are then created using random rigid body rotation followed by simulation
annealing. A final minimization is then used to refine the ligand poses.
Fig5: Molecule Scafold4 molecule 1 (high active) which has been subjected to
cdocker is showing its interaction with amino acids of 2ZDZ.
81
Fig6: Molecule 5 (low active) which has been subjected to cdocker is showing its
interaction with amino acids of 2zdz.
Table showing top 10 CDocker energies of high active molecule
Index Name CDOCKER_ENERGY(HA)1 Scafold4 molecule1 32.3612 Scafold4 molecule1 29.2993 Scafold4 molecule1 27.3284 Scafold4 molecule1 27.1225 Scafold4 molecule1 26.9726 Scafold4 molecule1 26.4947 Scafold4 molecule1 25.6818 Scafold4 molecule1 25.5469 Scafold4 molecule1 25.28410 Scafold4 molecule1 25.257
82
Table showing top 10 CDocker energies of low active molecule
Index Name CDOCKER_ENERGY(LA)1 Molecule 5 -14.5852 Molecule 5 -15.3823 Molecule 5 -17.5744 Molecule 5 -17.7195 Molecule 5 -18.2036 Molecule 5 -19.4077 Molecule 5 -19.7038 Molecule 5 -19.8009 Molecule 5 -19.88210 Molecule 5 -20.370
CONCLUSION:
The docking energies of the ligands were estimated by using CDOCKER protocol.
LUDI:
Fig7: The figure is representing the Interaction map generated using Ludi program.
83
Fig8: The above picture is showing the Denovo ligand generated in Ludi program.
Fig9: The above picture is showing the Denovo ligand occupied in the interaction
map generated in Ludi program.
84
Fig10: Denovo ligand generated is showing the interactions with Gly96 and Ser291
amino acids of protein 2zdz.
CONCLUSION:
The newly designed ludi molecules were found to satisfy the interaction sites for
the active site of the protein 2zdz.
LIB DOCK:
Uses CHARMm based molecular dynamics to dock ligands into a receptor
binding site. Random ligand conformations are generated.Lib dock uses the physico
chemical properties of the ligands to guide docking to corresponding features in the
protein binding sites by matching a triplet of ligand atoms to a triplet of protein hot spots.
85
Fig 11: Molecule Scafold4 molecule1(high active) which has been subjected to Lib
Dock is showing its interaction with amino acids of 2zdz.
86
Fig 12: Molecule 5 (low active) which has been subjected to Lib Dock is showing its
interaction with amino acids of 2zdz.
CONCLUSION:
Lib dock studies prove that the compound Scafold4 molecule1 have the libdock
energy 86.666.Molecule 5 (low active) has lib dock energy 110.87.
STRUCTURE BASED PHARMACOPHORE:
Structure based pharmacophore approach was to find an out the essential feature
of active site which can contribute for ligand binding.
Interaction generation:
Enumerates pharmacophore features from a protein active site. The site finding
algorithm from Ludi to identify points in the active site that could interact with the
receptor. Creates a pharmacophore query containing Hydrogen bond acceptor, donor and
hydrophobic features from these points
87
After interaction generation run, it Found 484features :minimized 2zdz
Found 112lipophilic features
Found 162 H-acceptor features
Found 210 H-donor features
Figure 13: Cluster feature of interaction generation.
88
Figure 14: Center points of cluster feature
Figure15: Mapping of active site amino acids with Structure Based
Pharmacophore Feature.
89
This structure based pharmacophore features are useful for virtual screening of large
database.
6. ANALOG BASED DRUG DESIGNING
The work in discovery studio depicts how chemical features hydrogen acceptor,
hydrogen donor, hydrophobic aliphatic of set of compounds along with their activities
ranging over several orders of magnitude can be used to generate pharmacophore
hypothesis that can successfully predict the activity. The models were not only predictive
within the same series of compounds but different classes of diverse compounds were
also effectively mapped onto most of the features important for activity. The
pharmacophore generated can be used for discovery of diversified structures that can be
potential lethal factor inhibitors, and to evaluate how well any novel compound maps on
to the pharmacophore developed during the study, using inhibitors against lethal factor
possessing distinct features which may be responsible for the activity of the inhibitors.
Analogue Based Pharmacophore Generation:
i. Common Feature Pharmacophore Generation (HIP HOP):
The 10 most active molecules were used to derive common feature based alignments.
All the 10 most active molecules were considered as reference molecules to get the best
features. The best features obtained from hip-hop run method are
1. Hydrogen bond acceptor, 2. Hydrogen bond acceptor lipid
3. Hydrogen bond donor 4. Hydrophobic
5. Ring aromatic
Table showing Summary of feature definition hits by molecule
Molecule A D H Z Y N X P W R
Molecule_1 18.07 7.15 3.79 0.79 3.00 0.00 0.00 0.00 2.00 8.00
A-hydrogen bond acceptor: H-hydrogen bond acceptor lipid: D-hydrogen bond donor:
z-hydrophobic; Y-hydrophobic aliphatic: X-hydrophobic aromatic:
90
N-negative Ionizable; P-positive with Exclusions ; W- Positive Ionizable;
R-ring aromatic.
Table showing Common Feature Pharmacophore Generation Rank File
Hypo.
No
Pharmacophore
Feature
Rank score Direct hit Partial hit Max fit
1 YDAA 14.643 1 0 4
2 YDAA 14.643 1 0 4
3 YDAA 14.627 1 0 4
4 YDAA 14.563 1 0 4
5 YDAA 14.563 1 0 4
6 YDAA 14.563 1 0 4
7 YDAA 14.563 1 0 4
8 YDAA 14.561 1 0 4
9 YDAA 14.561 1 0 4
10 YDAA 14.561 1 0 4
91
ii. HYPOGEN (Training set):
Sets of 10 hypotheses were generated using the data from 25 training set
compounds. Different cost values correlation coefficient RMS deviations and
pharmacophore features are listed in table.
The best pharmacophore is taken as the hypothesis 1 which has the highest cost
difference, lowest error cost, lowest RMS difference and the best correlation coefficient
has two hydrogen bond acceptors, one hydrophobic and one hydrogen bond donor
features. The best pharmacophore (hypo1) has the highest cost difference of 53.410, the
best correlation coefficient and RMS difference. For the highly active compound
pentagon carbon of pyrrole and another feature mapped to oxygen of side chain of the
pyrrole.The HBD feature mapped to one of the nitrogen of trinitro carbon. The HBA
feature is mapped to oxygen of the centroid.
Table showing 5 pharmacophore models generated by the hip-hop algorithm
Hypothesis Total Cost Difference RMS Correlation Features1 76.334 53.409 1.0798 0.936 YDAA2 77.126 53.617 1.1201 0.9313 YDAA3 79.650 50.093 1.2381 0.9154 YDAA4 81.281 48.462 1.3080 0.9051 YDAA5 81.875 47.868 1.3444 0.8994 YDAA6 84.156 45.587 1.4453 0.8826 YDAA7 85.321 44.222 1.4922 0.8743 YDAA8 86.179 43.564 1.5260 0.8682 YDAA9 87.308 42.435 1.5677 0.8603 YDAA10 88.177 41.566 1.6007 0.8538 YDAA
Note: Difference= Null cost – Total cost
Null cost=129.743
RMS=3.07536
Features:
Y= Hydrophobic aliphatic D=Hydrogen bond donar
A=Hydrogen bond acceptor
92
.
Figure 16: Showing the distances between Pharmacophore Features
93
Figure 17: Overlapping of highest active inhibitor molecules of training set with the
best pharmacophore .
Figure 18: Overlapping of lowest active inhibitor molecule of training set with the
best pharmacophore
94
Table showing Results of pharmacophore hypothesis generated using test set.
Name-P(tc) Activ-P(ts) -uM Estimate -P(ts) uM Fit value -P(ts)
Scafold3 molecule13 0.56 0.2195 7.1448Scafold4 molecule6 0.12 0.2575 7.0755Scafold4 molecule13 0.75 0.2833 7.0341Scafold3 molecule12 0.67 0.3044 7.0028Scafold4 molecule11 0.27 0.4449 6.8380Scafold3 molecule13 0.28 0.5833 6.7204Scafold4 molecule4 0.28 0.6382 6.6814Scafold4 molecule5 0.17 1.2128 6.4026Scafold4 molecule16 0.65 2.1886 6.1462Scafold4 molecule8 0.14 3.7547 5.9118Scafold3 molecule3 0.68 6.9683 5.6432Molecule 1 28.6 7.8400 5.5920Scafold4 molecule14 0.56 12.387 5.3934Scafold6 molecule5 3.23 14.790 5.3164Scafold6 molecule9 2.68 28.131 5.0372Scafold4 molecule12 0.24 61.707 4.6960Scafold3 molecule16 0.57 64.150 4.6792Scafold6 molecule7 2.39 77.852 4.5951Scafold5 molecule3 1.34 87.245 4.5456Scafold6 molecule6 0.22 96.207 4.5Scafold3 molecule1 0.26 113.68 4.4307Scafold3 molecule9 1.71 170.29 4.2552Scafold5 molecule4 1.54 215.63 4.1526Scafold5 molecule1 1.5 242.95 4.1008Scafold5 molecule5 0.46 257.70 4.0752Scafold6 molecule4 0.77 345.57 3.9478Scafold6 molecule3 0.57 455.23 3.8281Scafold5 molecule2 0.62 21220 2.1596Molecule 11 2.8 24666 2.0942Scafold3 molecule10 0.28 24782 2.0922Molecule 5 100 45076 1.8324
Discussion
Pharmacophore models of BASE1 lethal factor inhibitors are generated in
HypoGen module in DS software. HypoGen attempts to construct the simplest
hypotheses that best correlates the activities (experimental vs. predicted).
95
The dataset was divided into training set (16 compounds) and test set (31
compounds,), considering both structural diversity and wide coverage of the activity
range. The compounds with activity with < 1 uM were considered as highly actives (++
+), compounds with an activity range of 1-100 uM as moderate actives (++) and activity
of >100 uM as least actives (+).At end of run, HypoGen generated 5 pharmacophore
models. The Null cost for ten hypotheses was 129.743, the fixed cost of the run was
76.333 and the configuration cost was 15.9021. A difference of 53.410 bits obtained
between fixed and null costs is a sign of highly predictive nature of hypotheses. All 10
hypotheses generated showed high correlation coefficient between experimental and
predicted IC50 values. It indicates that all the hypotheses are having true correlation
between 80-95%. The cost values, correlation coefficients (r), RMSD, and
pharmacophore features are listed in Table12.The best pharmacophore (Hypothesis 1)
consisted of two H-bond acceptor (HBA), an H-bond donar (D), and a hydrophob
aromatic(Y) feature with a correlation coefficient (r) of 0.9363, total cost (76.3334), and
lowest RMSD value (1.07988) was chosen to further validate its predictive power by
estimating the activity of test set.
96
Graph showing Point plot representation of test set
QSAR:
In the present study quantitative structure activity relationship studies were
carried out on BASE 1 inhibitors in order to design selective and potential inhibitors.
QSAR models were developed using1D and 2D-descriptors using discovery studio
software. QSAR attempts to model the activity of a series of compounds using measured
or computed properties of the compounds. In the equation the term ‘N’ means the number
of data points, r2 which is the square of the correlation coefficient which describing the
binding of the compounds to the QSAR model. XV r2, a squared correlation coefficient
generated during a validation procedure using the equation
XV r2 = (SD PRESS)/SD
SD means the sum of squared deviations of the dependent variable values
from their mean the predicted sum of squares (PRESS), the sum of overall compounds of
the squared differences between the actual and the predicted values for the dependent
variables. The PRESS value is computed during a validation procedure for the entire
97
training set. The larger the PRESS value the more reliable is the equation. XV r2 is
usually smaller than the overall r2 for a QSAR equation. It is used as a diagnostic tool to
evaluate the predicted power of an equation generated using the multiple leaner
regression method.
GFA work by generating random populations of solution to a problem,
scoring the relative quality of the solution , and caring forward the most fit solutions or
analogues(generated through mutation and crossover)of other solutions to iteratively
generated(and finally converge on)new, more fit solution. In this study GFA analysis was
done with following parameters.
Population size
Initial equation length
Final equation length
Number of generation
Boot strap r2 correlation coefficient calculated during the validation procedure.
30 compounds were included in the training set to generate the primitive QSAR model
covering the widest data range of IC50 values 0.078 to 118 uM. The predictive characters
of QSAR were further assessed using test molecules. To judge the predictive ability of
the QSAR model for new drug candidates the IC50 values for the test and training set
were evaluated.
GFA parameters
Number of rows in model 30
Population 40
98
Maximum generation 50000
Initial terms per equation 20
Scoring function
Friedman
LOF
Mutation probability 0.1
Table showing GFA Prameters
The GFA method performs a search over the space of possible QSAR models using
lack of fit (LOF) scores to estimate the fitness of each model. These models lead to the
discovery of predictive QSAR equations.
qtr2_1 =
8.5055-1.5055 * Count<ECFP_6:672362763> − 1.5755 *
Count<ECFP_6:65758642 > + 3.9675* Count<ECFP_6:12965448167>
+0.20089*Count<ECFP_6:18844118037>+2.667Count<ECFP_6:1.5633445
59> O_Count
From the above equation, the positive values are the reference for the presence of
specific group at that point and increase the activity of molecule and the negative values
indicate the presence of ionic group which reduce the activity.
Table showing the validation statistics for the model.
Friedman LOF 0.03102
R-squared 0.9771
adjusted R-squared 0.9719
r2(predicted) -3.3864
99
RMS Residual Error 0.132
significance of regression P value 2.842e-17
Friedman L.O.F. is the Friedman lack-of-fit score;
S.O.R. p-value is the p-value for significance of regression
Table showing the Experimental and predicted values of Training set compounds using GFA
Name QSAR(Tr) Exp pIC50 QSAR(Tr) GFAT Model_1(Tr)
Prediction
error(Tr)
Scafold3 molecule1 9.59 9.35007 0.298943
Scafold3 molecule3 9.17 9.35007 0.298943
Scafold3 molecule4 9.51 9.36227 0.298943
Scafold3 molecule5 9.59 9.56468 0.313657
Scafold3 molecule6 9.36 9.35667 0.298943
Scafold3 molecule8 9.27 9.35007 0.298943
Scafold3 molecule10 9.55 9.56468 0.313657
Scafold3 molecule12 9.18 9.35007 0.298943
Scafold3 molecule14 9.25 9.35007 0.298943
Scafold3 molecule16 9.25 9.35007 0.298943
Scafold4 molecule2 9.9 9.87359 0.301231
Scafold4 molecule3 9.89 9.87359 0.301231
Scafold4 molecule5 9.76 9.87359 0.301231
Scafold4 molecule6 9.92 9.87359 0.301231
Scafold4 molecule7 9.91 9.87359 0.301231
Scafold4 molecule8 9.86 9.65898 0.305125
Scafold4 molecule9 9.69 9.87359 0.301231
Scafold4 molecule10 9.49 9.65898 0.305125
Scafold4 molecule11 9.57 9.65898 0.305125
Scafold4 molecule12 9.62 9.65898 0.305125
Scafold4 molecule15 9.69 9.65898 0.305125
100
Molecule 2 6.93 6.93 0.403709
Molecule 6 8.52 8.4972 0.320761
Molecule 7 7 7 0.403709
Molecule 11 8.55 8.49723 0.320761
Scafold5 molecule3 8.87 8.71184 0.316314
Scafold6 molecule7 8.62 8.71184 0.316314
Scafold6 molecule9 8.57 8.71184 0.316314
Scafold4 molecule1 10.11 9.87359 0.301231
Scafold3 molecule13 9.56 9.35007 0.298943
Graph Showing correlation between experimental and predicted activities by QSAR
equation using GFA method
101
Test Set
The purpose of QSAR is not only to produce the biological activity of the
training set but also to predict the values of the test set molecules. From the above
equation obtained for the training set molecules of known activity are introduced to study
table so as to predict the biological activity. A series of molecules were introduced to
study table which are known as test set molecules. After the prediction of activities of test
set molecules the activity of prediction crosses over 80%.
Table showing Experimental and predicted values of Test set compounds using GFA
Name QSAR(Ts) Exp pIC50QSAR(Ts)
GFAT
Model_1(Ts) Prediction error(Ts)
Molecule 8 7.05 7 0.387026
Molecule12 7.99 8.50547 0.307614
Scafold6 molecule4 8.4 8.70636 0.303292
Scafold6 molecule2 8.4 8.70636 0.303292
Scafold4 molecule13 9.12 9.66698 0.292624
Scafold4 molecule16 9.19 9.86787 0.288839
Scafold4 molecule4 9.56 9.86787 0.288839
Scafold4 molecule14 9.25 9.66698 0.292624
Scafold6 molecule5 8.49 8.70636 0.303292
Scafold3 molecule15 9.13 9.39202 0.2896
102
Graph Showing correlation between experimental and predicted activities by QSAR
equation using GFA method for test set.
The result generated from QSAR equation using GFA method, the values
observed for r2 and XV r2 are in specific range and there is a good correlation between
experimental and GFA predicted activity as listed. Good correlation is observed between
the experimental IC50 and computational predicted IC50 values. It has been suggested as
since the predictive ability of equations is good, they can be used to develop new analogs.
103
104
7. CONCLUSION:
As far as Insilco studies are concerned for beta secretase1 (BASE1) the algorithms
such as QSAR, Pharmacophore and docking were used. These algorithms showed good
results.
The 3D QSAR studies conducted for training set compound gave a good r2 score of
0.9771 with four outliers with a GFA graph with a Fit line representing the good
correlation of the compounds with the activities. The pharmacophore studies gave the
best quantitative pharmacophore model in terms of predictive value consisted of three
features like Hydrogen bond acceptor, Hydrogen bond donar, Hydrophobic aromatic.
Hypogen which is further validated by using a set of BASE1 inhibitors gave a correlation
value of 0.9363. The Pharmacophore studies showed three regions which showed
interactions i.e., hydrogen bond acceptor, Hydrophobic aromatic , hydrogen bond donor.
The Insilco modeling helped to guide the lead optimization and lead to the generation
of a highly potent series of BASE1 inhibitors with good drug like properties and is
subject of another communication. However, the scope for fine tuning and optimizing
this potent class of BASE1 inhibitor could lead to the generation of new therapeutic
agents.
The combined approach of analogue and structure based drug designing methods
allowed us to gain an insight into predicting the enhanced activity and exploring the
docking interactions between amino acid residues of lethal factor and the ligand. Good
ligands may not act as good drugs. Thus, the prime objective of this project to prove the
authenticity of our techniques obtained from the various journals is completed using
computer aided drug designing. The results obtained are used to develop new ligand
molecules and find their activities Insilico and proving the same in accordance with the
experimental values. Thus, the results reported can successfully employ in the rational
drug designing of novel and potent lethal factor inhibitors.
105
8. ABBREVIATIONS
BASE beta-site of APP-cleaving enzyme
GLY Glycine
HIS Histidine
LYS Lysine
MET Methionine
ASN Aspergine
CADD Computer Aided drug design
CNS Central Nervous system
HDL High density lipids
ASP Aspartic acid
LF Ligand fit
CHARMm Chemistry at Harvard macromolecular mechanics
QM Quantum mechanics
HYPO Hypothesis
MD Molecular dynamics
SD FILE Structural data file
uM Micro molar
NM Nano molar
% Percent
IC50 Half maximal inhibitory concentration
R² Regression co-efficient
XVR2 Cross validated regression co-efficient
PRESS Predicted residual error sum squares
LOF Lake of fit
CSD Cambridge structure data base
MLR Multiple linear regression
HBD Hydrogen bond donor
HBA Hydrogen bond acceptor
106
HY Hydrophobic
PDB Protein data bank
SBDD Structure based drug designing
ABGD Analog based drug designing
RMS Root mean square
HTS High throughput screening
DNA Deoxyribonucleic acid
NMR Nuclear magnetic resonance
QSAR Quantitative structure activity relationship
SAR Structure activity relationship
ADMET Adsorption distribution metabolism excretion toxicity
Table showing Legends used
107
108
9. REFERENCES
1. A good book over all, and chapter 7 in particular, is
G. L. Patrick "An Introduction to Medicinal Chemistry" Oxford (1995)
2. A more detailed description of computational techniques is
A. R. Leach "Molecular Modelling Principles and Applications" Longman (1996)
3. A recent review is
L. M. Balbes, S. W. Mascarella and D. B. Boyd, in "Reviews in Computational
Chemistry, Vol. 5" K. B. Lipkowitz, D. B. Boyd, Eds., VCH, 337 (1994)
4. A. Glucksmann, Cell deaths in normal vertebrate ontogeny, Biol. Rev 26 (1951),
pp. 59–86.
5. An introduction to computational techniques is
G. H. Grant, W. G. Richards "Computational Chemistry" Oxford (1995)
6. An introduction to De Novo techniques is
S. Borman Chemical and Engineering News 70 (12), 18 (1992)
7. An introduction to structure-based techniques is
I. D. Kuntz, E. C. Meng, B. K. Shoichet Acct. Chem. Res. 27 (5), 117 (1994)
8. Ashkenazi and V.M. Dixit, Death receptors: signaling and modulation, Science
281 (1998), pp. 1305–1308. View Record in Scopus | Cited By in Scopus (3145)
9. B. Hogan, R. Beddington, F. Costantini and E. Lacy, Manipulating the Mouse
Embryo (Second Edition), Cold Spring Harbor Laboratory Press, Cold Spring
Harbor, NY (1994).
10. B. Kallen, Cell degeneration during normal ontogenesis of the rabbit brain, J.
Anat 89 (1955), pp. 153–161.
11. beta-site of APP-cleaving enzyme From Wikipedia, the free encyclopedia
12. clinical testing athttp://rarediseases.info.nih.gov/ord/ct-info-patient.html
and http://rarediseases.info.nih.gov/ord/ct-about.html
109
13. Cohen, N. Claude (1996). Guidebook on Molecular Modeling in Drug Design.
Boston: Academic Press. ISBN 012178245x.
14. Drug design From Wikipedia, the free encyclopedia
15. Guner, Osman F. (2000). Pharmacophore Perception, Development, and use in
Drug Design. La Jolla, Calif: International University Line. ISBN 0-9636817-6-1.
16. Leach, Andrew R.; Harren Jhoti (2007). Structure-based Drug Discovery. Berlin:
Springer. ISBN 1-4020-4406-2.
17. Madsen, Ulf; Krogsgaard-Larsen, Povl; Liljefors, Tommy (2002). Textbook of
Drug Design and Discovery. Washington, DC: Taylor & Francis. ISBN 0-415-
28288-8.
18. Schneider G, Fechner U (August 2005). "Computer-based de novo design of
drug-like molecules". Nat Rev Drug Discov 4 (8): 649–63. doi:10.1038/nrd1799.
PMID 16056391.
19. Wang R,Gao Y,Lai L (2000). "LigBuilder: A Multi-Purpose Program for
Structure-Based Drug Design". Journal of Molecular Modeling 6: 498–516.
doi:10.1007/s0089400060498.
110