in-silico structure and analogue based studies on bace1 inhibitors for alzheimer’s disease

INDEX

Chapter No Title Page no.

1. Abstract 2

2. Aim of study 4

3. Introduction 5

3.1. Drug Designing 5

3.2. Protein 16

4. Materials and Methods 24

4.1. Structure Based Drug Design 28

4.2. De novo Ligand Design 35

4.3.Structure Based Pharmacophore Generation 40

4.4. Analogue Based Drug Design 42

5. Results and Discussion 77

5.1. Structure Based Drug Design 78

6. Analogue Based Drug Design 90

7. Conclusion 105

8. Abbreviations 106

9. References 108

1

1. ABSTRACT:

β-Secretase also called BACE1 (β-site of APP Cleaving Enzyme) or

memapsin-2. BACE1 is an aspartic-acid protease important in the pathogenesis of

Alzheimer's disease and in the formation of myelin sheaths in peripheral nerve cells. The

transmembrane protein, contains two active site aspartate residues in its extracellular

protein domain and may function as a dimer. BACE1 produces amyloid β peptide (the

primary constituent of neurofibrillary plaques, implicated in Alzheimer's disease,) by

cleavage of the amyloid precursor protein.

The potent BACE1 inhibitors have been suggested to be useful drugs. In this

QSAR, Pharmacophore and Docking studies on BACE1 inhibitors provided to be useful

to find new and potent active compounds against a neurodegenerative disorder,

Alzheimer's disease (AD). As per these studies high active compound had dock score of

82.634 when ligand fit protocol was used and the molecules formed hydrogen bond

interactions with ASP 290, GLY 96 amino acids, while low active compound showed

67.26. Using C-DOCKER protocol the molecules formed hydrogen bond interactions

with THR 293 amino acid, where high and low active compounds showed 36.36 and -

14.58 of C-Docker energy. Using Lib-Dock protocol the high active compound showed

86.88 of lib-dock score and the molecules formed hydrogen bond interaction with THR

294 amino acids, where as low active compound showed 110.87 of lib-dock score and the

molecules formed hydrogen bond interaction with ASP 290, which is same interaction of

that of crystal ligand when compared with lig-plot.

2

Novel Ligand found through the ludi formed hydrogen bond interaction with

active site amino acids GLY 96, SER 291.

Analogue based studies performed using pharmacophore generation on BACE1

inhibitors showed the important features from HipHop run as Hydrogen bond acceptor,

Hydrogen bond donor, Hydrophobic aromatic. Hypogen resulted with these features in

the training set as having cost difference of 53.41 and RMS value of 1.07. The test set

resulted with r2 value of 0.66 by plotting on the estimated activity. QSAR model

generated with training set had the r2 value of 0.972 while the test set has given the r2

value as 0.923.

3

2. AIM OF THE SUTDY

In the field of structure based drug design, there are some major goals of that

biologists seek to achieve.

To protect the proper structure of the proteins and if no X-ray

crystallography structure of the protein is available, then derive the protein

structure through homology modeling.

Given the structure of inhibitors and its target to predict correctly the

binding site on the target, the orientation of the ligand and the

conformations of the both.

Given the structure of a target macromolecules and a set of ligands is to

rank the order of the compounds in their experimental characterization.

The immediate major practical application of the above study are firstly to

improve the binding capacity of existing inhibitors and secondly to

suggest the lead compounds to locus the experimental screening effort

either by searching chemical database or by Denovo Drug Designing.

The main objective of the present study is:

I. To dock the ligand molecule (BASE-1 inhibitor) correctly on the active

site of the receptor.

II. QSAR studies to predict the structure activity relationship between the

ligand and the receptor.

III. To identify from the database and to suggest new molecules by structure

based drug designing or by analogue based drug designing.

4

3. INTRODUCTION

3.1 DRUG DESIGNING

Drug design also sometimes referred to as Rational Drug Design is the inventive

process of finding new medications based on the knowledge of the biological target. The

drug is most commonly a organic small molecule which activates or inhibits the function

of a biomolecule such as a protein which in turn results in a therapeutic benefit to the

patient. In the most basic sense, drug design involves design of small molecules that are

complementary in shape and charge to the bimolecular target to which they interact and

therefore will bind to it. Drug design frequently but not necessarily relies on computer

modeling techniques. This type of modeling often referred to as Computer Aided Drug

Design (CADD).

The phrase “Drug Design” is to some extent a misnomer. What is really meant by

drug design is ligand design. Modeling techniques for prediction of binding affinity are

reasonably successful. However, there are many other properties such as bioavailability,

metabolic half life, lack of side effects, etc. that first must be optimized before a ligand

can becomes a safe and efficacious drug. These other characteristics are often difficult to

optimize using Rational Drug Design techniques.

3.1.1 Background

Typically a drug target is a key molecule involved in a particular metabolic or

signaling pathway that is specific to a disease condition or pathology, or to the infectivity

or survival of a microbial pathogen. Some approaches attempt to inhibit the functioning

of the pathway in the diseased state by causing a key molecule to stop functioning. Drugs

may be designed that bind to the active region and inhibit this key molecule. Another

approach may be to enhance the normal pathway by promoting specific molecules in the

normal pathways that may have been affected in the diseased state. In addition, these

5

http://en.wikipedia.org/wiki/Pathogen

http://en.wikipedia.org/wiki/Microorganism

http://en.wikipedia.org/wiki/Infectivity

http://en.wikipedia.org/wiki/Pathology

http://en.wikipedia.org/wiki/Signal_transduction

http://en.wikipedia.org/wiki/Metabolic_pathway

http://en.wikipedia.org/wiki/Molecule

http://en.wikipedia.org/wiki/Adverse_drug_reaction

http://en.wikipedia.org/wiki/Biological_half-life

http://en.wikipedia.org/wiki/Bioavailability

http://en.wikipedia.org/wiki/Ligand_(biochemistry)

http://en.wikipedia.org/wiki/Misnomer

http://en.wikipedia.org/wiki/Molecular_modelling

http://en.wikipedia.org/wiki/Molecular_modelling

http://en.wikipedia.org/wiki/Electric_charge

http://en.wikipedia.org/wiki/Shape

http://en.wikipedia.org/wiki/Patient

http://en.wikipedia.org/wiki/Therapeutic_effect

http://en.wikipedia.org/wiki/Protein

http://en.wikipedia.org/wiki/Biomolecule

http://en.wikipedia.org/wiki/Small_molecule

http://en.wikipedia.org/wiki/Organic_compound

http://en.wikipedia.org/wiki/Biological_target

http://en.wikipedia.org/wiki/Medications

http://en.wikipedia.org/wiki/Invention

drugs should also be designed in such a way as not to affect any other important "off-

target" molecules that may be similar in appearance to the target molecule since drug

interactions with off-target molecules may lead to undesirable side effects. Sequence

homology is often used to identify such risks.

Most commonly, drugs are organic small molecules but protein based drugs (also

known as biologics) are becoming increasingly more common. In addition mRNA based

gene silencing technologies may have therapeutic applications.

3.1.2 Types

There are two major types of drug design. They are referred to as

Ligand-based drug design,

Structure-based drug design.

Ligand Based Drug Design

Ligand Based Drug Design (or Indirect Drug Design) relies on knowledge of

other molecules that bind to the biological target of interest. These other molecules may

be used to derive a pharmacophore, which defines the minimum necessary structural

characteristics a molecule must possess in order to bind to the target. In other words, a

model of the biological target may be built based on the knowledge of what binds to it

and this model in turn may be used to design new molecular entities that interact with the

target.

Structure Based Drug Design

Structure Based Drug Design (or Direct Drug Design) relies on knowledge of the

three dimensional structure of the biological target obtained through methods such as x-

ray crystallography or NMR spectroscopy. If an experimental structure of a target is not

available, it may be possible to create a homology model of the target based on the

experimental structure of a related protein. Using the structure of the biological target,

6

http://en.wikipedia.org/wiki/Homology_modeling

http://en.wikipedia.org/wiki/Protein_nuclear_magnetic_resonance_spectroscopy

http://en.wikipedia.org/wiki/X-ray_crystallography#Protein_crystallography

http://en.wikipedia.org/wiki/X-ray_crystallography#Protein_crystallography

http://en.wikipedia.org/wiki/Tertiary_structure

http://en.wikipedia.org/wiki/Pharmacophore

http://en.wikipedia.org/wiki/Gene_silencing

http://en.wikipedia.org/wiki/MRNA

http://en.wikipedia.org/wiki/Biologics


http://en.wikipedia.org/wiki/Organic_compound

http://en.wikipedia.org/wiki/Sequence_homology

http://en.wikipedia.org/wiki/Sequence_homology

http://en.wikipedia.org/wiki/Adverse_effect

candidate drugs that are predicted to bind with high affinity and selectivity to the target

may be designed using interactive graphics and the intuition of a medicinal chemist.

Fig 1: Flow charts of two strategies of Structure Based Drug Design

Alternatively various automated computational procedures may be used to

suggest new drug candidates. As the experimental methods as X-ray crystallography and

NMR develop, the amount of information concerning 3D structures of biomolecular

targets has increased dramatically, as well as the structural dynamic and electronic

information about the ligands. This encourages the rapid development of the Structure

Based Drug Design. Current methods for structure-based drug design can be divided

roughly into two categories.

The first category is about “finding” ligands for a given receptor, which is usually

referred as database searching. In this case, a large number of potential ligand molecules

are screened to find those fitting the binding pocket of the receptor. This method is

usually referred as ligand-based drug design. The key advantage of database searching is

that it saves synthetic effort to obtain new lead compounds.

7

http://en.wikipedia.org/wiki/Medicinal_chemistry

http://en.wikipedia.org/wiki/Ligand_(biochemistry)#Selective_and_non-selective

http://en.wikipedia.org/wiki/Dissociation_constant

http://en.wikipedia.org/wiki/File:Flow_charts_of_two_strategies_of_structure_based_drug_design.jpg

Another category of structure-based drug design methods is about “building”

ligands, which is usually referred as receptor-based drug design. In this case, ligand

molecules are built up within the constraints of the binding pocket by assembling small

pieces in a stepwise manner. These pieces can be either atoms or fragments. The key

advantage of such a method is that novel structures, not contained in any database, can be

suggested. These techniques are raising much excitement to the drug design community.

Active site identification

Active site identification is the first step in this program. It analyzes the protein to

find the binding pocket, derives key interaction sites within the binding pocket, and then

prepares the necessary data for Ligand fragment link. The basic inputs for this step are

the 3D structure of the protein and a pre-docked ligand in PDB format, as well as their

atomic properties. Both ligand and protein atoms need to be classified and their atomic

properties should be defined, basically, into four atomic types:

Hydrophobic atom: all carbons in hydrocarbon chains or in aromatic groups.

H-bond donor: Oxygen and nitrogen atoms bonded to hydrogen atom(s).

H-bond acceptor: Oxygen and sp2 or sp hybridized nitrogen atoms with lone

electron pair(s).

Polar atom: Oxygen and nitrogen atoms that are neither H-bond donor nor H-

bond acceptor; sulfur, phosphorus, halogen, metal and carbon atoms bonded to

hetero-atom(s).

The space inside the ligand binding region would be studied with virtual probe

atoms of the four types above so the chemical environment of all spots in the ligand

binding region can be known. Hence we are clear what kind of chemical fragments can

be put into their corresponding spots in the ligand binding region of the receptor.

Ligand fragment link

The term “fragment” is used here to describe the building blocks used in the

construction process. The rationale of this algorithm lies in the fact that organic structures

8

can be decomposed into basic chemical fragments. Although the diversity of organic

structures is infinite, the number of basic fragments is rather limited.

Before the first fragment, i.e. the seed, is put into the binding pocket, and add

other fragments one by one. We should think some problems. First, the possibility for the

fragment combinations is huge. A small perturbation of the previous fragment

conformation would cause great difference in the following construction process. At the

same time, in order to find the lowest binding energy on the Potential Energy Surface

(PES) between planted fragments and receptor pocket, the scoring function calculation

would be done for every step of conformation change of the fragments derived from

every type of possible fragments combination. Since this requires a large amount of

computation, one may think using other possible strategies to let the program works more

efficiently. When a ligand is inserted into the pocket site of a receptor, conformation

favor for these groups on the ligand that can bind tightly with receptor should be taken

priority. Therefore it allows us to put several seeds at the same time into the regions that

have significant interactions with the seeds and adjust their favorite conformation first,

and then connect those seeds into a continuous ligand in a manner that make the rest part

of the ligand having the lowest energy. The conformations of the pre-placed seeds

ensuring the binding affinity decide the manner that ligand would be grown. This strategy

reduces calculation burden for the fragment construction efficiently. On the other hand, it

reduces the possibility of the combination of fragments, which reduces the number of

possible ligands that can be derived from the program. These two strategies above are

well used in most structure-based drug design programs. They are described as “Grow”

and “Link”. The two strategies are always combined in order to make the construction

result more reliable.

9

http://en.wikipedia.org/wiki/Potential_energy_surface

Fig2: Flow chart for structure based drug design

Scoring method

Scoring functions for docking

Structure-based drug design attempts to use the structure of proteins as a basis for

designing new ligands by applying accepted principles of molecular recognition. The

basic assumption underlying structure-based drug design is that a good ligand molecule

should bind tightly to its target. Thus, one of the most important principles for designing

or obtaining potential new ligands is to predict the binding affinity of a certain ligand to

its target and use it as a criterion for selection.

10

http://en.wikipedia.org/wiki/Scoring_functions_for_docking

http://en.wikipedia.org/wiki/File:Flow_chart_for_structure_based_drug_design.jpg

http://en.wikipedia.org/wiki/File:Master_Equation_in_Scoring_Function.jpg

A breakthrough work was done by Bohm to develop a general-purposed empirical

function in order to describe the binding energy. The concept of the “Master Equation”

was raised. The basic idea is that the overall binding free energy can be decomposed into

independent components which are known to be important for the binding process. Each

component reflects a certain kind of free energy alteration during the binding process

between a ligand and its target receptor. The Master Equation is the linear combination of

these components. According to Gibbs free energy equation, the relation between

dissociation equilibrium constant, Kd and the components of free energy alternation was

built.

The sub models of empirical functions differ due to the consideration of

researchers. It has long been a scientific challenge to design the sub models. Depending

on the modification of them, the empirical scoring function is improved and continuously

consummated.

3.1.3 Rational drug discovery

In contrast to traditional methods of drug discovery which rely on trial-and-error

testing of chemical substances on cultured cells or animals, and matching the apparent

effects to treatments, rational drug design begins with a hypothesis that modulation of a

specific biological target may have therapeutic value. In order for a biomolecule to be

selected as a drug target, two essential pieces of information are required. The first is

evidence that modulation of the target will have therapeutic value. This knowledge may

come from, for example, disease linkage studies that show an association between

mutations in the biological target and certain disease states. The second is that the target

is "druggable". This means that it is capable of binding to a small molecule and that its

activity can be modulated by the small molecule.

Once a suitable target has been identified, the target is normally cloned and

expressed. The expressed target is then used to establish a screening assay. In addition,

the three-dimensional structure of the target may be determined. The search for small

molecules that bind to the target is begun by screening libraries of potential drug

11

http://en.wikipedia.org/wiki/Drug_discovery#Screening_and_Design

http://en.wikipedia.org/wiki/Protein_expression

http://en.wikipedia.org/wiki/Molecular_cloning

http://en.wikipedia.org/wiki/Animal

http://en.wikipedia.org/wiki/Cell_culture

http://en.wikipedia.org/wiki/Trial-and-error

http://en.wikipedia.org/wiki/Drug_discovery

compounds. This may be done by using the screening assay (a "wet screen"). In addition,

if the structure of the target is available, a virtual screen may be performed of candidate

drugs. Ideally the candidate drug compounds should be "drug-like", that is they should

possess properties that are predicted to lead to oral bioavailability, adequate chemical and

metabolic stability, and minimal toxic effects. One way of estimating drug likeness is

Lipinski's Rule of Five. Several methods for predicting drug metabolism have been

proposed in the scientific literature, and a recent example is SPORCalc. Due to the

complexity of the drug design process, two terms of interest are still serendipity and

bounded rationality. Those challenges are caused by the large chemical space describing

potential new drugs without side-effects.

3.1.4 Computer Assisted Drug Design

Computer Assisted Drug Design uses computational chemistry to discover,

enhance, or study drugs and related biologically active molecules. The most fundamental

goal is to predict whether a given molecule will bind to a target and if so how strongly.

Molecular mechanics or molecular dynamics are most often used to predict the

conformation of the small molecule and to model conformational changes in the

biological target that may occur when the small molecule binds to it. Semi-empirical, ab

initio quantum chemistry methods, or density functional theory are often used to provide

optimized parameters for the molecular mechanics calculations and also provide an

estimate of the electronic properties (electrostatic potential, polarizability, etc.) of the

drug candidate which will influence binding affinity. Molecular mechanics methods may

also be used to provide semi-quantitative prediction of the binding affinity. Alternatively

knowledge based scoring function may be used to provide binding affinity estimates.

These methods use linear regression, machine learning, neural nets or other statistical

techniques to derive predictive binding affinity equations by fitting experimental

affinities to computationally derived interaction energies between the small molecule and

the target.

12

http://en.wikipedia.org/wiki/Neural_net

http://en.wikipedia.org/wiki/Machine_learning

http://en.wikipedia.org/wiki/Linear_regression


http://en.wikipedia.org/wiki/Density_functional_theory

http://en.wikipedia.org/wiki/Ab_initio_quantum_chemistry_methods

http://en.wikipedia.org/wiki/Ab_initio_quantum_chemistry_methods

http://en.wikipedia.org/wiki/Semi-empirical_quantum_chemistry_method


http://en.wikipedia.org/wiki/Molecular_dynamics

http://en.wikipedia.org/wiki/Molecular_mechanics

http://en.wikipedia.org/wiki/Molecule

http://en.wikipedia.org/wiki/Drugs

http://en.wikipedia.org/wiki/Computational_chemistry

http://en.wikipedia.org/wiki/Adverse_effect

http://en.wikipedia.org/wiki/Chemical_space

http://en.wikipedia.org/wiki/Bounded_rationality

http://en.wikipedia.org/wiki/Serendipity#Pharmacology

http://en.wikipedia.org/wiki/Lipinski's_Rule_of_Five

http://en.wikipedia.org/wiki/Oral_bioavailability

http://en.wikipedia.org/wiki/Druglikeness

http://en.wikipedia.org/wiki/Virtual_screening

Ideally the computational method should be able to predict affinity before a

compound is synthesized and hence in theory only one compound needs to be

synthesized. The reality however is that present computational methods provide at best

only qualitative accurate estimates of affinity. Therefore in practice it still takes several

iterations of design, synthesis, and testing before an optimal molecule is discovered. On

the other hand, computational methods have accelerated discovery by reducing the

number of iterations required and in addition have often provided more novel small

molecule structures.

Drug design with the help of computers may be used at any of the following

stages of drug discovery:

Hit identification using virtual screening (structure- or ligand-based design)

Hit-to-lead optimization of affinity and selectivity (structure-based design, QSAR,

etc.)

Lead Optimization of other pharmaceutical properties while maintaining affinity

Fig3: Role of computer aided drug designing

13

http://en.wikipedia.org/wiki/Drug_development

http://en.wikipedia.org/wiki/Quantitative_structure-activity_relationship

http://en.wikipedia.org/wiki/Drug_discovery_hit_to_lead

http://en.wikipedia.org/wiki/Virtual_screening

Benefits of CADD

CADD methods and bioinformatics tools offer significant benefits for drug discovery

programs.

1. Cost Savings. The Tufts Report suggests that the cost of drug discovery and

development has reached $800 million for each drug successfully brought to

market. Many biopharmaceutical companies now use computational methods and

bioinformatics tools to reduce this cost burden. Virtual screening, lead

optimization and predictions of bioavailability and bioactivity can help guide

experimental research. Only the most promising experimental lines of inquiry can

be followed and experimental dead-ends can be avoided early based on the results

of CADD simulations.

2. Time-to-Market. The predictive power of CADD can help drug research programs

choose only the most promising drug candidates. By focusing drug research on

specific lead candidates and avoiding potential “dead-end” compounds,

biopharmaceutical companies can get drugs to market more quickly.

3. Insight. One of the non-quantifiable benefits of CADD and the use of

bioinformatics tools is the deep insight that researchers acquire about drug-

receptor interactions. Molecular models of drug compounds can reveal intricate,

atomic scale binding properties that are difficult to envision in any other way.

When we show researchers new molecular models of their putative drug

compounds, their protein targets and how the two bind together, they often come

up with new ideas on how to modify the drug compounds for improved fit. This is

an intangible benefit that can help design research programs.

CADD and bioinformatics together are a powerful combination in drug research and

development.

14

3.1.5 Software

In silico studies described in this project were carried out using the tools available

in Discovery Studio by Accelrys

Discovery studio is a complete modeling and simulations environment for life

science researchers. Discovery Studio is a single, easy-to-use, graphical interface for

powerful drug design and protein modeling research. Discovery Studio 2.5 combines

established gold-standard applications such as Catalyst, Modeler, and CHARMm that

have years of proven results and utilizes cutting-edge science to address the drug

discovery challenges of today. Discovery Studio 2.5 is built on the Pipeline Pilot open

operating platform to seamlessly integrate protein modeling, pharmacophore analysis,

virtual screening, and third-party applications. It offers

Fig 4: Features available in Discovery Studio 2.5

o Interactive, visual and integrated software.

o Consistent, contemporary user interface for added ease-of-use

o Tools for visualization, protein modeling, simulation, docking, pharmacophore analysis, QSAR and library design

o Access computational servers and tools, share data, monitor jobs, and prepare and communicate their project progress.

15

3.2 PROTEIN

3.2.1 Introduction

Classification: Hydrolase

Molecule: Beta-secretase 1

Structure Weight: 46928.46

Polymer: 1 Type: polypeptide(L)

Length: 415

Chains: A

EC#: 3.4.23.46

Fragment: UNP residues 46-454

Protein ID: 2ZDZ

Beta Secretase:

β-Secretase also called BACE1 (β-site of APP cleaving enzyme) or memapsin-2. BACE1

is an aspartic-acid protease important in the pathogenesis of Alzheimer's disease, and in

the formation of myelin sheaths in peripheral nerve cells The transmembrane protein,

contains two active site aspartate residues in its extracellular protein domain and may

function as a dimer. BACE1 produces Amyloid β (A β)peptide(the primary constituent of

neurofibrillary plaques, implicated in Alzheimer's disease,) by cleavage of the amyloid

precursor protein.

16

Fig5: Secondary structure of BACE1

Cerebral deposition of amyloid beta peptide (A-beta) is an early and critical

feature of Alzheimer's disease. A-beta generation depends on proteolytic cleavage of the

Amyloid Precursor Protein (APP) by two unknown proteases: Beta-Secretase And

Gamma-Secretase. These proteases are prime therapeutic targets. A transmembrane

aspartic protease with all the known characteristics of Beta-Secretase was cloned and

characterized. Over expression of this protease, termed BACE (for Beta-Site App-

Cleaving Enzyme) increased the amount of beta-secretase cleavage products, and these

were cleaved exactly and only at known beta-secretase positions. Antisense inhibition of

endogenous BACE messenger RNA decreased the amount of beta-secretase cleavage

products, and purified BACE protein cleaved APP-derived substrates with the same

sequence specificity as beta-secretase. Finally, the expression pattern and subcellular

localization of BACE were consistent with that expected for beta-secretase. Future

development of BACE inhibitors may prove beneficial for the treatment of Alzheimer's

disease.

Beta-Secretase (BACE) is a membrane protein that contains two necessary Asp

residues in its ectodomain (extracellular domain) which are used in the first cleavage of

the N terminal domain of the beta amyloid precursor protein to release a soluble, N-

17

terminal fragment of about 100,000 MW. g-secretase, necessary for the second cleavage

which frees the Ab peptide is a heterotetramer composed of presenillin-1, nicastrin, APH-

1 and PEN-2, and is located in neural plasma membranes and endoplasmic reticulum.

The Ab peptide moves to the extracellular side of the neural membrane where it

aggregates. The remaining cytoplasmic part of the beta-amyloid precursor protein may

regulate transcription. The presenilin subunit has protease activity. g-secretase also

cleaves another cell surface receptor protein, Notch. When this receptor has bound an

extracelluar ligand, g-secretase cleaves Notch within the cytoplasm, and the released

fragment modifies gene transcription. The APH-1 subunit appears to inhibit presenilin

protease activity while PEN-2 promotes it. Inhibiting g-secretase would be an effect

treatment for Alzheimers, but might have serious side effects since Notch processing

would also be affected.

Pathway:

The beta-secretase protein quartet, and its roles in brain development and

Alzheimer's disease. Presenilin-1, nicastrin, APH-1 and PEN-2 form a functional gamma-

secretase complex, located in the plasma membrane and endoplasmic reticulum (ER) of

neurons. The complex cleaves Notch (left) to generate a fragment (NICD) that moves to

the nucleus and regulates the expression of genes involved in brain development and adult

neuronal plasticity. The complex also helps in generating the amyloid beta-peptide

(Abeta; centre). This involves an initial cleavage of the amyloid precursor protein (APP)

by an enzyme called BACE (or beta-secretase). The gamma-secretase then liberates

Abeta, as well as an APP cytoplasmic fragment, which may move to the nucleus and

regulate gene expression. Mutations in presenilin-1 that cause early-onset Alzheimer's

disease enhance gamma-secretase activity and Abeta production, and also perturb the ER

calcium balance. Consequent neuronal degeneration may result from membrane-

associated oxidative stress, induced by aggregating forms of Abeta (which create Abeta

plaques), and by the perturbed calcium balance.

18

Figure: Cleavage of beta amyloid precursor protein: protease and cofactors

Beta Secretase Processing:

APP processing in CEMs. The amyloid protein precursor (APP) is a type I

transmembrane protein that is processed in several different pathways. Generation of the

amyloid β protein (Ab) in the β-secretase pathway (A and B) requires two proteolytic

events, a proteolytic cleavage at the amino terminus of the Ab sequence, referred to as β-

secretase cleavage and a cleavage at the carboxyl terminus, known as γ-secretase

cleavage. Cleavage by β-secretase results in the secretion of sAPPb and production of the

membrane-bound carboxyl terminal fragment β (CTFb). γ-Secretase cleavage of CTFb

produces the secreted Ab peptide and the CTF-γ. In the α-secretase pathway (C), the APP

is cleaved within Ab to generate a large, secreted derivative referred to as sAPPa and a

membrane-associated CTF-α. Ab production in the β-secretase pathway appears to occur

in CEMs that are indicated by the presence of high levels of cholesterol in the membrane

19

http://employees.csbsju.edu/HJAKUBOWSKI/classes/ch331/protstructure/alzheimprotease.htm

(light gray squares) and GM1 ganglioside (dark gray squares). It is not certain whether the

CEMs that contain β- and γ-secretase activity are contiguous (A) or spatially distinct (B).

Local production of Ab in CEMs (A or B) could result in local aggregation due to the high

concentrations of Ab and the fibril promoting factors present in CEMs. In non-CEM

membranes, the α-secretase pathway is favored (C).

Two proteases produce Ab from the amyloid β protein precursor (APP) through

sequential cleavages (reviewed in ref.11). APP is first cleaved by β-secretase (BACE1,

Asp2, memapsin1), a transmembrane aspartyl protease, at the amino terminus of Ab to

generate a large, secreted derivative (sAPPb) and a membrane-bound APP carboxyl

terminal fragment (CTFb). Subsequent cleavage of CTF-β by γ-secretase results in

production of the Ab peptide and CTF-γ. In a second pathway, APP is cleaved within the

Ab sequence by α-secretase, which generates another large, secreted derivative and CTF

(sAPPa and CTFa).

Recent evidence indicates that the first cleavage step in Ab generation (Fig. 1), β-

secretase cleavage, may occur in CEMs. β-Secretase is enriched in CEMs that are distinct

from caveolar containing CEMs.12 Although β-secretase activity was not measured, the

concentration of mature β-secretase in these membranes provides initial evidence that this

cleavage may occur at this site. This localization would also be consistent with the

observation that lowering cholesterol reduces β-secretase cleavage, described in detail

below. In addition, there is evidence that alterations in caveolin-3 expression can alter β-

secretase cleavage of APP.13 How this relates to the presence of β-secretase in non-

caveolar CEMs is not clear.

20

http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=eurekah&part=A15416#A15438


http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=eurekah&part=A15416&rendertype=figure&id=A15420


Disease:

Alzheimer's disease (AD) also called Alzheimer disease, Senile Dementia of the

Alzheimer Type (SDAT) or simply Alzheimer's, is the most common form of dementia.

This incurable, degenerative, and terminal disease was first described by German

psychiatrist and neuropathologist Alois Alzheimer in 1906 and was named after him.

21

http://en.wikipedia.org/wiki/Alois_Alzheimer

http://en.wikipedia.org/wiki/Germany

http://en.wikipedia.org/wiki/Terminal_illness

http://en.wikipedia.org/wiki/Degenerative_disease

http://en.wikipedia.org/wiki/Dementia

Generally it is diagnosed in people over 65 years of age, although the less-prevalent

early-onset Alzheimer's can occur much earlier. An estimated 26.6 million people

worldwide had Alzheimer's in 2006; this number may quadruple by 2050.

Although the course of Alzheimer's disease is unique for every individual, there

are many common symptoms. The earliest observable symptoms are often mistakenly

thought to be 'age-related' concerns, or manifestations of stress. In the early stages, the

most commonly recognised symptom is memory loss, such as difficulty in remembering

recently learned facts.

As the disease advances, symptoms include confusion, irritability and aggression,

mood swings, language breakdown, long-term memory loss, and the general withdrawal

of the sufferer as their senses decline. Gradually, bodily functions are lost, ultimately

leading to death.

Biochemistry:

Alzheimer's disease has been identified as a protein misfolding disease

(proteopathy), caused by accumulation of abnormally folded A-beta and tau proteins in

the brain. Plaques are made up of small peptides, 39–43 amino acids in length, called

beta-amyloid (also written as A-beta or Aβ). Beta-amyloid is a fragment from a larger

protein called amyloid precursor protein (APP), a transmembrane protein that penetrates

through the neuron's membrane. APP is critical to neuron growth, survival and post-

injury repair. In Alzheimer's disease, an unknown process causes APP to be divided into

smaller fragments by enzymes through proteolysis. One of these fragments gives rise to

fibrils of beta-amyloid, which form clumps that deposit outside neurons in dense

formations known as senile plaques.

22

http://en.wikipedia.org/wiki/Senile_plaques

http://en.wikipedia.org/wiki/Proteolysis

http://en.wikipedia.org/wiki/Enzymes

http://en.wikipedia.org/wiki/Transmembrane_protein

http://en.wikipedia.org/wiki/Amyloid_precursor_protein

http://en.wikipedia.org/wiki/Beta-amyloid

http://en.wikipedia.org/wiki/Amino_acid

http://en.wikipedia.org/wiki/Peptide

http://en.wikipedia.org/wiki/Proteopathy

http://en.wikipedia.org/wiki/Protein_folding

http://en.wikipedia.org/wiki/Long-term_memory

http://en.wikipedia.org/wiki/Mood_swing

http://en.wikipedia.org/wiki/Mental_confusion

http://en.wikipedia.org/wiki/Memory_loss

http://en.wikipedia.org/wiki/Stress_(biological)

http://en.wikipedia.org/wiki/Early-onset_Alzheimer's

Enzymes act on the APP (amyloid precursor protein) and cut it into fragments. The beta-

amyloid fragment is crucial in the formation of senile plaques in AD.

.

23

http://upload.wikimedia.org/wikipedia/commons/f/fb/Amyloid-plaque_formation-big.jpg

4. MATERIALS AND METHODS:

In the last few years the role of computational methods in both pharmaceutical

and academic research has developed dramatically. The emphasis being placed on high

throughput methods in the pharmaceutical industry, which has increased the number of

compounds in the discovery pipeline. Characterizing the position and orientation of small

molecules bound to a protein surface can be an important step in drug design.

Computational methods developed rapidly as groups seek high throughput, low cost

approaches in accelerating the drug discovery process. Such approaches will be necessary

as scientists attempt to characterize the large number of drugs currently being generated.

Structural information of biological macro molecules and their importance with ligand is

increasingly being used in modern medicinal chemistry. There is a pressing used for

novel computational methods that can evaluate the structural information about ligand

receptor complexes in a more quantitative way , both to improve existing leads and to

design de novo compounds with accurately predicted binding affinities. The following

experimental methods categorically divided into three parts:

Structure based drug designing

Docking studies

a) Ligand Fit

b) CDOCKER

c) Lib Dock

1. Structure based pharmacophore generation

2. Ludi

Analogue based drug designing

1. Common feature pharmacophore generation (HipHop)

2. 3D pharmacophore generation (HypoGen)

3. Quantitative structure activity relationships (QSAR)

24

Preparation of Molecular System

Macromolecule (protein 2ZDZ) Preparation:

Load the protein and apply the force field

For this QSAR, pharmacophore and docking studies, the protein 2ZDZ is

loaded from RCSB protein data bank (www.rcsb.org/pdb/) and force field is applied.

Force field refers to the functional form parameter sets which are used to find out

potential energy of a system. It includes parameter which is obtained through

experimental works and quantum mechanics calculations. All molecules in a mechanical

system are made up of a number of components. Covalently bonded atoms takes into

consideration several parameters such as bond length , bond angle , dihedral angles etc.,

similarly there exists non-bonded interactions such as Van der Waals interactions ,

electrostatic interactions. Thus the total potential energy of the system is calculated as

follows

E1= [E bond + E angle + E torsion + E vanderwaals + E electronic ]This summation when given is an explicit form, represents force field, evaluating the

potential of a system.

Minimization :

The Minimizer uses an algorithm to identify the geometrics of the

molecule corresponding to the minimum points on the potential surface energy. The

Minimizer reduces the unwanted forces which are present in the molecule and lowers the

energy level of the molecule. There are many algorithms available in the minimization

process. Some of the minimization methods used in the Smart Minimizer is Steepest

Descent method, Conjugate Gradient method, Newton Raphson method and quasi

Newton method. From the DS protocols select the Minimization option and run the

protocol for the protein with fixed constraints .Then save the minimized protein for

further studies.

25

http://www.rcsb.org/pdb/

Fig 10: Minimized 2ZDZ

Fig11: Representation of important amino acids

26

http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=2zdz&template=ligplotbox.html&param1=1&param2=1&param3=ligand

Important amino acids were identified as

GLY96,ASP290,ASP94,THR293,SER291. Based on the ligplot information and theory

from the below stated articles.

Preparation of bio active molecules:

65 bioactive compounds with the activity range 0.078 uM to >118 uM were collected

from the following four journals:

Acylguanidine inhibitors of Beta-secretase:Optimizatioin of the pyrrole ring

substituents extending in to the S1 substrate binding pocket Bioorganic &

Medicinal Chemistry Letters 18 (2008) 767-771.LeeD.Jennings, Derek C.Cole,

Joseph R.Stock, MOhani N. Sukhdeo, John W.Ellingboe, Rebecca Cowling,

Guixizn Jin, Eric S. Manas, Kristi Y. Fan, Michael S.Malamas, Boyd L. Harrison,

Steve Jacobsen, Rajiv Chopra, Peter A. Lohse, William J. Moore, Mary-Margaret

o’Donnell, Yun Hu, Albert J.Robichaud,M.James Turner, Erik Wagner and

Jonathan Bard.

Design and synthesis of potent Beta-secretase (BACE-1) inhibitors with P1

caroxylic acid bioisosteres. Bioorganic & Medicinal Chemistry Letters 16 (2006)

2380-2386. Tooru Kimura, Yoshio Hamada, Monika Stochaj, Hayato ikari, Ayaka

Nagamine, Hamdy Abdel-Rahman, Naoto Igawa, Koushi Hidaka, Jeffrey-Tri

Nguyen, Kazuki Saito, Yoshio Hayashi and Yoshiaki Kiso.

Novel non-peptide beta-secretase inhibitors derived from structure based virtual

screening and bioassay.Bioorganic & Medicinal Chemistry Letters 19 (2009)

3188-3192.Weijun Xu, Gang Chen, Oi Wah Liew, Zhili Zuo, Hualiang Jiang,

Weiliang Zhu.

Design, Synthesis and biological evaluation of novel dual inhibitors of

acetylcholinesterase and beta-secretase. Bioorganic & Medicinal Chemistry

Letters 17 (2009)1600-1613.Yiping Zhu, Kun Xiao,Lanping Ma, Bin Xiong, Yan

Fu,Haiping Yu,Wei Wang,Xin Wang, Dingyu Hu, Hongli Peng,Jingya Li,Qi

Gong, Qian Chai, Xican Tnag,Haiyan Zhang, Jia Li,JingKang Shen.

27

Procedure:

1. A basic scaffold of the molecules was sketched using the sketching tools

available in DS. Modifications were made to the scaffold to make sketches of

all the 65 molecules which were saved as files with .mol extensions.

2. Sketched molecules are typed with CHARMm force field.

3. The typed molecules are subjected to the energy minimization using Smart

Minimizer which minimizes a series of ligand poses using CHARMm.

4. Minimized molecules are saved with .sd and .mol extension for further study.

5.1. Structure or Target Based Drug Design

Structure Based Drug Design, the three dimensional structure of drug target

interacting with small molecules (drug) is used to guide drug discovery. Drug targets are

typically key molecules involved in a specific metabolic or cell signaling pathway that is

known, or believed, to be related to a particular disease state. Drug targets are most often

proteins and enzymes in these pathways. Drug compounds are designed to inhibit, restore

or otherwise modify the structure and behavior of disease-related proteins and enzymes.

SBDD uses the known 3D geometrical shape or structure of proteins to assist in

the development of new drug compounds. The 3D structure of protein targets is most

often derived from x-ray crystallography or nuclear magnetic resonance (NMR)

techniques as they have the resolution few angstroms (about 500,000 times smaller than

the diameter of a human hair). At this level of resolution, researchers can precisely

examine the interactions between atoms in protein targets and atoms in potential drug

compounds that bind to the proteins. This ability to work at high resolution with both

proteins and drug compounds makes SBDD as one of the most powerful methods in drug

design

28

Once bound at the receptor site, drugs may act either to initiate a response (agonist

action or stimulant) or decrease the activity potential of that receptor (antagonist action or

Depressant) by blocking access to it by active molecules. Thus, any drug may have

structural features that contribute independently to the affinity for the receptor and to the

efficiency with which the drug receptor combination initiates the response (intrinsic

activity or efficiency). The response is related to the drug receptor complexes. The

affinity of a drug may be estimated by comparison of the dose required to produce a

pharmacological response with the dose required by a reference standard drug or the

natural ligand for that receptor. The affinity of a drug may be estimated by comparison of

the dose required to produce a pharmacological response with the dose required by a

reference standard drug or the natural ligand for that receptor. Structure based drug

design, the three dimensional structure of drug target interacting with small molecules

(drug) is used to guide drug discovery. Structure based drug designing is employed with

the following parts:-

1. Structure based pharmacophore generation

2. Ludi

Molecular Docking

In the field of molecular modeling, docking is a method which predicts the

preferred orientation of one molecule to a second when bound to each other to form a

stable complex. Knowledge of the preferred orientation in turn may be used to predict the

strength of association or binding affinity between two molecules using for example

scoring functions. Molecular docking may be defined as an optimization problem, which

would describe the “best-fit” orientation of a ligand that binds to a particular protein of

interest. Docking is useful for predicting both the strength and type of signal produced.

The focus of molecular docking is to computationally stimulate the molecular

recognition process. The aim of molecular docking is to achieve an optimized

29

http://en.wikipedia.org/wiki/Molecular_recognition

http://en.wikipedia.org/wiki/Molecular_recognition


http://en.wikipedia.org/wiki/Dissociation_constant#Protein-Ligand_binding

http://en.wikipedia.org/wiki/Supramolecular_chemistry

http://en.wikipedia.org/wiki/Binding_(molecular)

http://en.wikipedia.org/wiki/Molecular_modeling

conformation for both the protein and ligand and relative orientation between protein and

ligand such that the free energy of the overall system is minimized.

Docking is frequently used to predict the binding orientation of small molecule

drug candidates to their protein targets in order to in turn predict the affinity and activity

of the small molecule. Hence docking plays an important role in the rational design of

drugs. Given the biological and pharmaceutical significance of molecular docking,

considerable efforts have been directed towards improving the methods used to predict

docking.

Two approaches popular docking approaches exist. The conformational search

approach uses a matching technique that describes the protein and the ligand as

complementary surfaces. The second approach using Scoring methods simulates the

actual docking process in which the ligand-protein pair wise interaction energies are

calculated. These are of 3 types: Force field based, Empirical based and Knowledge

based methods. Both approaches have significant advantages as well as some limitations.

Scoring is the process of evaluating a particular pose (candidate binding mode) by

counting the number of favorable intermolecular interactions such as hydrogen bonds and

hydrophobic contacts.

There are several docking methods which are used to dock ligands in different

docking algorithms. Each method has its own advantages and disadvantages. Two

docking methods available in Discovery Studio by Accelrys and used in the present study

are Ligand fit and CDOCKER. These are summarized below.

30

http://en.wikipedia.org/wiki/Hydrophobic_effect

http://en.wikipedia.org/wiki/Hydrogen_bond

http://en.wikipedia.org/wiki/Intermolecular_force

http://en.wikipedia.org/wiki/Pharmaceutical

http://en.wikipedia.org/wiki/Rational_drug_design

http://en.wikipedia.org/wiki/Rational_drug_design

http://en.wikipedia.org/wiki/Drug


http://en.wikipedia.org/wiki/Gibbs_free_energy

Fig 12: Docking work flow

4.1. i. Ligand Fit

LigandFit is a shape-based method for accurately docking ligands into protein

active sites. The method employs a cavity detection algorithm for detecting invaginations

in the protein as candidate active site regions. A shape comparison filter is combined with

a Monte Carlo conformational search for generating ligand poses consistent with the

active site shape. Candidate poses are minimized in the context of the active site using a

grid-based method for evaluating protein-ligand interaction energies. Errors arising from

grid interpolation are dramatically reduced using a new non-linear interpolation scheme.

Protein shape:

Sites are defined based on the shape of the protein. An “eraser” algorithm is used

to clean all the grid points outside the protein. The boundary between inside and outside

is determined by defining the opening size parameter. Within the boundary a flood filling

algorithm is employed to search unoccupied grid points which form the cavities (sites).

All sites detected can be browsed according to their size, and a user defined size cut-off

eliminates sites smaller than the specified size.

31

Dock ligand:

Sites are defined based on a docked ligand. If there is a docked ligand the

unoccupied grid points within a certain user definable distance to ligand atoms are

collected to form the site. The site can be edited (enlarged, contracted and deleted), saved

and later restored for further studies.

Ligand fit is designed to search the binding site of a protein and dock a series of

potential ligands into the binding site. During docking the protein is rigid, in which the

ligand remains flexible allowing the conformations to be searched and docked with in the

binding site. The three dimensional structure of protein and ligand are required. There are

three key steps in this process.

a. Site search

The position and shape binding site of protein is defined to a grid. The active site

shape is defined based on the shape of the protein, from which all sites are detected.

Docked ligand method is used to define active site, in which unoccupied grid points

within a certain user definable distance to ligand atoms are collected to form the site.

b. Conformational search

The Monte Carlo simulation is employed in the conformational search of the ligand.

During the search, bond lengths and bond angles are untouched only torsional angles

(except those in a ring) are randomized. Therefore, the ligand molecules should be energy

minimized to ensure correct bond lengths and bond angles before using ligand fit.

c. Ligand fitting

After a new conformer is generated, the ligand fitting is carried out in two steps.

First the non mass- weighted principle moment of inertia (PMI) of the binding site is

compared with non mass- weighted principle moment of inertia (PMI) of the ligand. If

the value (Fit value) is above the threshold or not better fitting results previously saved, no

further docking process will be performed. If the value (Fit value) is better than

previously saved results the ligand is positioned into the binding site according to the

PMI. Because PMI is a scalar property, there are four possible positions for the ligand to

32

orient in the binding site. For each position, the corresponding docking score is

computed.

The docking score is negative value of the non-bonded inter molecular energy

between ligand and protein. After the docking score is calculated, for each orientation it

is compared with the results saved previously. If the new one is better, it is saved, and

then the process of conformational search and ligand fitting is iterated until number of

trials is reached. Finally rigid body minimization is applied to the saved conformations of

the ligand to optimize their positions and docking scores.

Procedure

Steps followed for Ligand Fit:

1. Potent inhibitor molecules which can inhibit the action of BASE-1 were taken.

2. Molecules with diversified similarities and pharmacophore features were selected from

the literature.

3. The molecules which are to be docked in a receptor site are saved into a .sd file so that

all molecules are processed for the docking score at a site.

4. The active site of a protein is identified by the find site from receptor cavities which is

processed by the flood flow algorithm.

5. The identification of the active site is located by the already docked ligand

6. The protein molecule is selected, the set of molecules in the .sd file are chosen and

docking score is calculated.

7. Thus, the docking score for a set of molecules are calculated through Ligand Fit.

4.1 ii CDOCKER

Docking of ligands to a receptor consists of 2 phases. The first phase is simply

the positioning of ligand in the binding site. This phase is typically referred to as finding

the poses for the ligand. The second phase involves the evaluation of individual poses

also known as scoring. It is imperative that the true hits and poses be distinguished from

incorrect ones. It can be said that the scoring of the poses is the most important phase.

Generally these methods include empirical scoring functions, knowledge based potentials

33

and force field derived methods. Many force field based methods are based on the

following simple relationship:

E binding = E complex – (E receptor + E ligand)

The binding energy is the left over after removing the internal energy of the

individual components (the receptor and the ligand).

CDOCKER is a molecular dynamics (MD) simulated-annealing-based docking

method that employs CHARMm. CDOCKER (CHARMm-based DOCKER), is a grid-

based MD docking algorithm, which offers all the advantages of full ligand flexibility

(including bonds, angles, dihedrals), the CHARMm19 family of force fields, the

flexibility of the CHARMm engine, and reasonable computation times.

It has been employed in Discovery Studio through the Dock ligands

(CDOCKER). In CDOCKER the receptor is held rigid while the ligands are allowed to

flex during the refinement. Random ligand conformations are generated from the initial

ligand structure through high temperature molecular dynamics followed by random

rotations. To adequately explore the conformation space, many different optimization

methods and search strategies have been developed, including distance-geometry, Monte

Carlo (MC) simulated-annealing, genetic algorithms (GAs), and molecular

dynamics.The random conformations are refined by grid based simulated annealing and a

final grid based or full force field minimization. Soft-core potentials are found to be

effective in exploring the conformational space of small organics and macromolecules

and are being used in various applications, including docking and the prediction of

protein loop conformations. During the docking process, the non bonded interactions

(including Vander Waals (vdW) and electrostatics) are softened at different levels, but

this softening is removed for the final minimization.

CDOCKER is especially useful for very flexible ligands having more than 30

rotatable bonds.

Details of the CDOCKER Docking Protocol

In the standard protocol, 50 replicas for each ligand are generated and randomly

distributed around the center of the active site. The internal coordinates for each of the

34

replicas are kept the same as those originally generated from CORINA (used to generate

2D representation of the ligand molecules). The MD simulated annealing process is

performed using a rigid protein and flexible ligand. The ligand-protein interactions are

computed from either GRID I, GRID II, or the full force field. A final minimization step

is applied to each of the ligand’s docking poses. The minimization consists of 50 steps of

steepest descent followed by up to 200 steps of conjugate-gradient using an energy

tolerance of 0.001 kcal mol-1. These minimized docking poses are then clustered based on

a heavy atom RMSD approach using a 1.5 Å tolerance. The final ranking of the ligand’s

docking poses is based on the total docking energy (including the intra molecular energy

for ligands and the ligand-protein interactions). A ligand-protein docking is considered a

success if the RMSD between the top ranking (lowest energy) docking pose and the

ligand’s X-ray position is less than 2.0 Å. The docking accuracy is then computed as the

percentage of successfully docked ligands from a test set.

CDOCKER steps:

1. Define the receptor and search for binding sites,

2. Prepare and run the dock ligands (CDOCKER) protocol,

Procedure:

1. Open the receptor protein and apply the charmM force field

2. Define the selected molecule as a receptor after that select the ligand define

sphere from selection

3. Open the CDOCKER protocol and set the parameters

4. Run the protocol

4.2. DENOVO LIGAND DESIGN

De Novo ligand design identifies potential novel ligands by screening a library of

small molecules to find those that are complementary to a target receptor.

Complementarity is defined as an appropriate spatial orientation of hydrogen bonding

35

and hydrophobic function groups. Molecules that cannot be fitted without incurring Van

der Waals clashes or electrostatic repulsions are screened out during the search process.

Ludi

De novo methods use the Ludi algorithm which works in 3 steps:

1. Interaction sites within a defined search sphere inside the target receptor are

calculated. Typically the search sphere definition is based on the location of a set

of known ligands which bind within receptor cavity.

2. Ludi formatted library are searched for fragments which can fit inside the sphere

while forming favorable bond interactions with the interactions sites.

3. An alignment of linking for the fragment is proposed.

To generate the interaction sites Ludi uses a set of rules that are intended to cover

the complete range of energetically favorable orientation for H bonds and

hydrophobic contacts. Fitting fragments into the interaction sites and subsequent

alignment (linking) of fragments to a partially build ligand is controlled by several

options

Steps and application of parameters which are used in hypothesis generation

1. Import the molecules in view compound work bench and clean the constructed

molecules.

2. Apply Catalyst force field, and then do the 3D minimization.

Conformation search

The aim of the conformation search is to obtain the diversified

conformations .Conformations generation methods are classified into two types. One is

best method and the other is fast method. Both the methods emphasize broad coverage to

cover the conformational space. Fast conformer generation is used to cover the

conformational space of molecules. It uses systematic or random search depending on the

36

size of the molecules. Systematic search is useful for small molecules and random search

is used for macromolecules. In the case of macro molecules the conformers are

minimized by poling algorithm.

Conformational analysis stops when one of three conditions is met:

After maximum number of conformers have generated.

Energy of the newly generated conformer is too high to the predefined energy rest

hold.

If there is no possible new conformer generation after certain number of trials.

Ligand design

The design of new ligand for protein (enzyme inhibiter) for protein is carried out

if the structure is known. If the structure of one or more protein – inhibitor complex is

known ,the design may be added by study that identifies essential ligand - protein

interaction .there are two approach to find a compound can fit into active site

The known structure approach:

Searching through database such a Cambridge structure database identifies

structures that fit the active site. The advantage of this approach is that the molecules

retrieved from the database do exist and their structure represents low energy

conformation. This approach does not address the issue of conformation flexibility.

The fragment approach:

This approach use a library of fragment the idea is to position molecular fragment

into the active site, in such a way that hydrogen bond can be formed with the protein and

hydrophobic pockets filled with hydrophobic groups. The fragment is than connected by

suitable a pacer fragment to form single molecules.

Ludi can also suggest modification of known ligand that may enhance its activity against

the target protein. The following Chart shows the Ludi work flow.

37

Fig 13: Ludi work flow

Ludi method

Ludi is based on fragment approach method. It suggests how suitable and small

fragments can be positioned into cleft of protein structures. This positioning is the

strength Ludi because it immediately provides with the ideas about how putative binding

site on the protein can be saturated by the fragment and those fragments might be linked

together .Ludi works in three steps:

1. It calculates interaction site within the protein active site or from the active

angles.

2. It searches libraries for fragments and fits than onto the interaction sites.

3. To process an alignment or linked for the fragment.

Ludi distinguishes four types of interaction sites.

1. H-donor

2. H acceptor

3. Lipophilic aliphatic

4. Lipophlic aromatic

38

The aromatic and aliphatic interactions are suitable sites for hydrophobic interactions

The H donor and H acceptor interaction sites are suitable for H bond formation. Ludi is

capable for fitting fragments on to the interaction sites and simultaneously a linking (i.e

linking) them to an existing ligand.

Method:

1. Identification of chemical nature of active site amino acids

2. Fragments identification and analysis of Ludi score

3. Searching for link

4. Linking the fragments

5. Fusing the fragment and linking

6. Docking validation.

Fragment fitting

The next step is to fit fragments onto the interaction sites. Ludi searches the list of

interaction sites by distance criteria for suitable sets of two to sites to match the

fragments. Required interaction are specified are specified using targeted mode. In

targeted mode fragments are require to interact with the protein atom or atoms specified

by the user. Any fragment fit that does not interact with the entire set of specified target

atoms is rejected.

To fit the fragment, Ludi performs a root mean squares (RMS) superimposition

using algorithm given by Kabasch (1978). A fragment fit is accepted if the RMS value is

less than a user defined threshold (typically 0.2A to 0.6A) , and no vanderwaals overlap

of the fitted fragment with the protein occurs, and if the electrostatic check parameter on

the Ludi runtimes parameters control panel is checked , no unacceptable electrostatic

repulsions are found. When the receptor structure is not known, a fragment fit is rejected

if the fragment extends outside the volume defined by the set of active analogs.

Link sites: Aligning fragments with partially built ligands

Ludi is capable of fitting fragments onto the interaction sites and simultaneously

aligning (i.e. linking) them to an existing to a ligand. For this purpose, link sites are

39

defined on the ligand. A link site is a hydrogen atom that all the hydrogen atoms of the

positioned ligand (within a user specified cutoff radius) are link sites.

Ludi fragment libraries

The Ludi fragment library is divided into two parts. The de novo library is used when

Ludi is run in no-link mode. The link library is used when Ludi is run in link mode. The

de novo library and the link library each consist of two files, a file that specifies the

fragment topologies and a file that specifies the interaction types of fragment functional

groups.

Procedure

1. It calculates interaction sites within the protein 1SNU active site or from the

active analogs.

2. It searches libraries for fragments and fits them from onto the five interaction sites

which are present at the active site.

3. It proposes an alignment or linking for the fragments and the new ligand is

designed.

The highest activity with the best dock score is better fitted when

compared to other. A knowledge based approach is to suggest possible binding positions.

The present experimental studies carried out using Ludi program. This program is studied

to dock small molecular fragments within protein binding sites using interactions

between the donor hydrogen and its acceptor is close to 1.8Å and the angle subtended at

the hydrogen is rarely less than 1.20o. Information about the preferred geometries of such

interactions can be obtained from analysis of X ray crystallographic database. Kelbe has

performed a very careful analysis of non bonded contacts observed in the CSD.

4.3. STRUCTURE BASED PHARMACOPHORE GENERATION

Structure based pharmacophore approach was performed to find out the essential

feature of active site which can contribute for ligand binding.

The interaction generation protocol takes an input receptor and a defined active site

and analyzes the active site for donors, acceptors, and hydrophobes. The result of the

40

calculation is an interaction map. The density of polar site parameter specifies the density

of the vectors in the interaction site for hydrogen bonds. The density of lipophilic sites

parameter specifies the density of points in the interaction site for lipophilic atoms.

Procedure:

1. Load the interaction generation protocol from the protocols explorer. The

parameters display in the parameter explorer

2. Ensure that the structure you want to define as the receptor is open in 3D window.

Use the binding site tool panel to define the structure as the receptor.

3. Set the input site sphere parameter to define the active site. Select the ligand from

the receptor ligand complex and define the input site sphere

4. The radius of the site sphere can change by selecting the sphere and changing the

radius in the attributes dialog.

5. Select the receptor structure from the input receptor parameter list.

6. Select the sphere as the input site sphere parameter

7. Set the remaining parameter as desired and run the protocol.

Lib Dock:

Lib Dock uses protein site features referred to as Hot Spots. HotSpots consist of

two types: polar and apolar. Apolar Hotspots is preferred by a polar ligand atom and a

apolar Hotspot is preferred by an apolar atom. The receptor HotSpot file is calculated

prior to the docking procedure. However, If desired, a pre-defined or user adjusted

HotSpot file can be used. The protocol allows the user to specify several modes for

generating ligand conformations for docking. If an input ligand file consist of ligand

conformations, the conformer generation can be turned off.

The rigid ligand poses are placed in to the active site and Hotspots are matched as

triplets. The poses are pruned and a final optimization step is performed before the poses

are scored. Ligand hyfrogens, which are removed during the docking process are added

41

to the ligand poses. These hydrogens are not optimized, so they may require further

optimization to ensure that receptor-ligand hydrogen bonds are formed correctly.

MCSS:

A new method is proposed for determining energetically favorable positions and

orientations for functional groups on the surface of proteins with known three-

dimensional structure. From 1,000 to 5,000 copies of a functional group are randomly

placed in the site and subjected to simultaneous energy minimization and/or quenched

molecular dynamics. The resulting functionality maps of a protein receptor site, which

can take account of its flexibility, can be used for the analysis of protein ligand

interactions and rational drug design. Application of the method to the sialic acid binding

site of the influenza coat protein, hemagglutinin, yields functional group minima that

correspond with those of the ligand in a cocrystal structure.

The multiple copy simultaneous search (MCSS) method is utilized to search for

optimal positions and orientations of a set of functional groups. For peptide ligands,

functional groups corresponding to the protein main chain (N-methylacetamide) and to

protein side chains (e.g., methanol, ethyl guanidinium) are used. The resulting N-

methylacetamide minima are connected to form hexapeptide main chains with a simple

pseudoenergy function that permits a complete search of all possible ways of connecting

the minima. Side chains are added to the main-chain candidates by application of the

same pseudoenergy function to the appropriate functional group minima.

4.4. ANALOGUE BASED DRUG DESIGN

Analogue Based Drug Design refers to the application of the knowledge of the

ligand structure and their activity when very less or no information is available about the

3D structure of the target to design a drug. It is required to design the binding site based

on the known structure of the ligands.

42

4.4. i. Pharmacophore generation

“A pharmacophore is an ensemble of steric and electronic features that is

necessary to ensure the optimal supra molecular interactions with a specific biological

target and to trigger (or block) its biological response.” Perceiving a pharmacophore is

the most important first step towards understanding the interaction between a receptor

and ligand. In the early 1900’s Paul Ehrlich offered the first definition for a

pharmacophore. A pharmacophore was first defined by Paul Ehrlich in 1909 as "a

molecular framework that carries (phoros) the essential features responsible for a drug’s

(=pharmacon's) biological activity" .

Catalyst provides the tools for selecting potential ligand compounds prior to

synthesis. The aim of this software is to reduce the time and cost of screening, synthesis

and biological testing. It accelerates the drug discovery process by identifying lead

candidates faster.

Pharmacophore or hypothesis describes the generalized molecular features

involved in the binding of ligand to activate site of proteins molecular features including

1D which represents the physical and biological properties, 2D represents the sub

structures and 3D represents the chemical features such as acceptors, donors, positive,

negative, ionizable, hydrophobic (aromatic & aliphatic) and ring compounds features. In

Catalyst each hypothesis can be defined in four parts. The first one is chemical features,

second is location and orientation in 3D dimensional space, third is tolerance and fourth

is weight. Weight represents the relative importance of each chemical function in

conferring activity

A pharmacophore model or hypothesis consists of a three-dimensional

configuration of chemical functions surrounded by tolerance spheres. A tolerance sphere

defines that area in space that should be occupied by a specific type of chemical

functionality. Pharmacophore models are routinely used in lead identification and

optimization in the areas of library focusing, evaluation and prioritization of virtual high

throughput screening (VHTS) results, de novo design, and scaffold hopping.

Pharmacophore models can be constructed using analog-based (using known active

ligands) or receptor-based techniques (using receptor active site information). In the

43

http://en.wikipedia.org/wiki/Biological_activity

http://en.wikipedia.org/wiki/Pharmacon

http://en.wikipedia.org/w/index.php?title=Molecular_scaffold&action=edit

http://en.wikipedia.org/wiki/1909

http://en.wikipedia.org/wiki/Paul_Ehrlich

absence of crystallographic structure data of a protein for which the active site for

receptor binding is clearly identified, a chemist must rely on the structure activity data for

a given set of ligands. If these ligands are known to bind to the same receptor, then one

can attempt to define the commonality between them. Accelrys Catalyst program can

generate two types of automated pharmacophore models, Hypo Gen and HipHop,

depending on whether or not activity data is used. In the presence of protein crystal

structure data, active site pharmacophore models can be used as a pre-filter for docking

large libraries. Generation of a pharmacophore model using the active site residue

information is the key to the success of any pharmacophore-based docking algorithm. In

the absence of X-Ray bound ligand information; it is a challenge to select a single

pharmacophore model that represents the binding characteristics. A methodology is

proposed in this case study that can be used to analyze and visualize multiple

pharmacophore models. This methodology can be applied to different types of Catalyst

pharmacophore models (qualitative, quantitative, receptor-based, etc.) as it only considers

feature types and coordinates.

This methodology can be applied successfully to the following applications:

VHTS screening

Multiple binding mode identification

Classification of proteins based on binding characteristics

Visualization of pharmacophore model space

To build a better pharmacophore, the following steps were employed:

1. Building a set of molecules

2. Conformer generation

3. Hypothesis Generation

4. Database Search

5. Compare/Fit to estimate Activity

The Feature Dictionary list contains the generalized chemical functions in Catalyst.

44

Definitions of these functions are:

1. HB ACCEPTOR (vector): Matches the following types of atoms or groups of atoms

with surface accessibility-

sp or sp2 nitrogen’s that have a lone pair and charge less than or equal to zero

sp3 oxygen’s or sulfurs that have a lone pair and charge less than or equal to zero

non-basic amines that have a lone pair

Does not match: basic, primary, secondary, and tertiary amines that are protonated at

physiological pH. There is no exclusion of electron-deficient pyridines and imidazoles.

2. HB ACCEPTOR lipid (vector): Matches these types of atoms or groups of atoms:

nitrogen’s, oxygens, or sulfurs (except hypervalent) that have a lone pair and charge less

than or equal to zero. This function is the same as HB ACCEPTOR except that it includes

basic nitrogen. There is no exclusion of electron-deficient pyridines and imidazoles.

3. HB DONOR (vector): Matches these types of atoms or groups of atoms:

Non-acidic hydroxyls

Thiols

Acetylenic hydrogens

NHs (except tetrazoles and trifluoromethyl sulfonamide hydrogens)

Does not match: electron-rich pyridines and imidazoles that would be protonated or

nitrogen’s that would be protonated due to their high basicity

4. HYDROPHOBIC (point): Matches these types of groups of atoms:

A contiguous set of atoms that is not adjacent to any concentrations of charge (charged

atoms or electronegative atoms) in a conformer such that the atoms have surface

accessibility such as phenyl, cycloalkyl, isopropyl, and methyl.

5. HYDROPHOBIC ALIPHATIC (point): Matches these types of groups of atoms:

A contiguous set of atoms that are not adjacent to any concentrations of charge (charged


accessibility is cycloalkyl, isopropyl, and methyl

6. HYDROPHOBIC AROMATIC (point): Matches these types of groups of atoms:

45

A contiguous set of atoms that is not adjacent to any concentrations of charge (charged


accessibility such as phenyl and indole.

7. NEG CHARGE (atom): Matches negative charges not adjacent to a positive charge.

8. NEG IONIZABLE (point): Matches atoms or groups of atoms that are likely to be

deprotonated at physiological pH, such as:

Trifluoromethyl sulfonamide hydrogens

Sulfonic acids (centroid of the three oxygens)

Phosphoric acids (centroid of the three oxygen’s)

Sulfinic, carboxylic, or phosphinic acids (centroid of the two oxygen’s)

Tetrazoles

Negative charges not adjacent to a positive charge

9. POS CHARGE (atom): Matches positive charges not adjacent to a negative charge.

10. POS IONIZABLE (point): Matches atoms or groups of atoms that are likely to be

protonated at physiological pH, such as:

Basic amines

Basic secondary amidines (iminyl nitrogen)

Basic primary amidines, except guanidine’s (centroid of the two nitrogen’s)

Basic guanidine’s (centroid of the three nitrogen’s)

Positive charges adjacent to a negative charge do not match weakly basic aromatic

nitrogen’s such as pyridine and imidazole.

11. RING AROMATIC (vector and plane): Matches 5- and 6-membered aromatic

rings. The feature defines 2 points, the ring centroid and a projected point normal to the

ring plane. The projected point can map both above and below the ring.

Steps to be followed in DS:

1. Construct or import the molecules.

2. Perform conformational search

46

3. Examine the each conformer for the presence of chemical features.

4. Determine the set of features that correlate with activity

Pharmacophore hypothesis

Catalyst’s Confirm Common Feature Pharmacophore generation (HipHop) and

3D QSAR generation (HypoGen) are applications that provide tools to generate

pharmacophore hypothesis. The hypotheses are created by generating conformation for a

set of study molecules, then using the conformation to find and align chemically

important functional groups common to the molecules in the study set. Chemically

important functional groups common to the molecules in the study set. Each hypothesis

can also incorporate data on the biological activities of the study molecules.

Steps involved generating a pharmacophore hypothesis:

1. Generate conformations

The interface to confirm is used to generate conformations for a single molecule or

a set of molecules. The number of conformation needed to produce a good representation

of a compound conformational space depends on the molecules. Both conformations

generating algorithms available in Confirm (Best and Fast) are adjusted to produce a

diverse set of conformations, avoiding repetition groups of conformations all representing

local minima.

The conformations all representing local minima.

The conformations generated by Confirm can be used as input into HipHop and

HypoGen to align common molecular features and generate a hypothesis.

Align common features to generate a hypothesis.

The following procedure involves

1. Aligning common molecular features.

2. Setting preferences using control panel

3. Incorporating activity data into a hypothesis

47

4. Using aligned structures to generate receptor models.

HipHop and HypoGen use conformations generated in Confirm to align

chemically important functional groups common in the molecules in the study set. A

pharmacophore hypothesis can then be generated from these aligned structures.

Incorporated biological activity data into a hypothesis

The HipHop is also used to incorporate biological activity data into the hypothesis

generating process. Each hypothesis is tested by regression techniques to compare

estimated activity with actual activity data. The software uses the data from these tests to

select the hypothesis that do the best job predicting activity for the set of study molecules.

This capability is provided by Catalyst / HypoGen.

4.4 ia Common feature pharmacophore generation (HipHop)

Pharmacophore based on multiple common features alignment generate receptor

models using Hip Hop. The objective is to identify and enumerate all possible

pharmacophore configurations that are common to the training set. The aligned structures

the model receptor menu card is included in the hypothesis models card deck so that you

can use structures that have been aligned in HipHop to generate a receptor surface model.

Since structures used in HipHop are aligned by common chemical features, the receptor

surface model that is generated for them can be significantly different from a receptor

surface model generated from template aligned structures.

The ideal HipHop training set are as follows:-

2-30 compounds ideally 6 molecules

Structurally diverse set of input molecules.

Feature rich compounds

Include the most active compounds

Spread sheet set up for HipHop

48

Molecules hypothesis generation work bench imported into a spread sheet

principal specific the reference molecules references configuration models are potential

centres for hypothesis

If (0) do not consider these molecules

If (1) consider configuration of the molecules.

If(2) use this compound as a reference molecules used only for HipHop

hypothesis generation

Maximum omit features: shows how many features for each compound may be omitted

If (0) all features must map to generate hypothesis

If (1) all but one feature must map to generate hypothesis

If(2) features need to map to generate hypothesis used only for HipHop

hypothesis generation.

When compound data appear in the spreadsheet, you are ready to add values in the

Principal and MaxOmitFeat columns. Common feature hypothesis generation uses

values in these columns to determine which molecules should be considered when

building hypothesis space and which molecules should map to all or some of the

features in the final hypotheses.

In the Principal column, a value of 2 means that all the chemical features in the

compound will be considered in building hypothesis space. A value of 1 means that

features will be considered when generating hypotheses and that at least one mapping for

each generated hypothesis will be found unless the Misses or Complete Misses options

are used. A value of 0 means the compound will be ignored.

The MaxOmitFeat column specifies how many hypothesis features must map to

the chemical features in each compound a 0 in this column forces mapping of all features,

a 1 means that all but one feature must map, and a 2 allows hypotheses to which no

compound features map

4.4.ii 3 D QSAR Pharmacophore generation (HypoGen)

49

HypoGen attempts to derive SAR models for a set of molecules for which activity

value (IC50 or Ki) on a given biological target are available. HypoGen optimizes

hypothesis that are present in the highly active compounds in the training set. But missing

among the least active (or inactive) ones. It attempts to construct the simplest hypothesis

that best correlates that activity (estimates vs. measured) the predicted models are created

the predicted models are created in three stages:

Constructive

Subtractive

Optimization

Fig14: HypoGen process flow

50

Pharmacophore domain

Feasible models

Top scoring models

Constructive phase

Subtractive phase

Optimization phase

1. Constructive Phase:

The constructive phase identifies hypotheses those are common to the most active

set of compounds. The process flow of this phase is depicted below:

Fig 15: Constructive phase process flow

2. Subtractive Phase: The objective of this phase is to identify those pharmacophore

configurations that are developed in the constructive phase that are also present in

the least active set of molecules and remove them. The process is depicted as

follows:

51

Training setMost active compounds

Identify the most active compounds

Enumerate all possible pharmacophore configurations.

Check for duplicates.

Ensure that the rest of most actives fit to MinSubsetPoint features.

Pharmacophore Domain

2nd most active

The most active

(Most Active Cmpd x Unc)-(CmpdX/Unc)>0

Identify the least active compounds

Enumerate all possible pharmacophore configurations.

Check for configurations shared with the most active compounds.

Eliminate if shared by more than half of the least actives.Feasible pharmacophores

2nd most active

The most active

log(CmpdX)-log(Most Active Cmpd)>3.5

Training set

Least active compounds

Fig 16: Subtractive phase process flow

3. Optimization Phase:

This phase involves improvement of hypotheses score. HypoGen reports

the top scoring 10 unique pharmacophores. The process flow is depicted as

follows:

Fig 17: Optimization phase process flow

The constructive phase identifies hypothesis that are common to the most active

set of compounds.

The most active set is determined by the following equation:

MA x UncA = (A/UncA)>0.0

Where MA is the activity of the most active compounds

Uncert is the uncertainty in the measured activity and A is the activity of the compound

The most active set of compounds is limited to a maximum of 8. Once the set is

determined HypoGen enumerates all possible pharmacophore features for each of the

52

Feasible pharmacophores

Features and /or locations are varied to optimize activity prediction via stimulated annealing approach.

Geometric fits are calculated.

Linear regression of –log(Activity) vs Geometric Fit performed.

Total cost is calculated for each new hypothesisTotal cost = [Cost(Err)xCC(Err)]+[Cost(Wt)xCC(Wt)]+[Cost(Cnfg)xCC(Cnfg)]Where CCs are the cost coefficients contained in CATALYST_CONF/hypo.data

Stops when the optimization no longer improves the score.

“Occam’s Razor”: the simplest hypothesis that accurately estimates the activity is considered the best

conformations for the two most active compounds. Furthermore, the hypothesis must fit a

minimum subset of features of the remaining most active compounds in order to be

considered. At the end of the constructive phase a database of every number of

pharmacophore configurations is generated. The objective of the subtractive phase is to

identify those pharmacophore configurations is generated. The objective of the

subtractive phase is to identify that pharmacophore configuration developed in the

constructive phase that is also present in the least active set of molecules and remove

them. The first step is the identification of the least active compounds. This is

accomplished by the equation

Log (A) - log (MA) < 3.5

Where the A is the activity of the current compound and MA is the activity of the most

active compound.

In simple terms, all compounds whose activity is 3.5 order of magnitude less than

that of the most active compound are considered to be in the set of least active molecules.

The value 3.5 is user adjustable parameter, if needed (i.e., if the activity range of the

dataset does not span more than 3.5 orders of magnitude the subtractive phase identifies

the hypothesis that are common to the least active compounds the least active set is

determined by the following equation:

log (cmpdx)-log (most active compounds) > 3.5

It enumerates all possible pharmacophore configurations. Then it checks for

configuration with the most active compounds and eliminates if shared by more than half

of the least actives leading to feasible pharmacophore.

The optimization phase involves improvement of the hypothesis score.

Small perturbations are applied to those pharmacophore configurations that survived the

subtractive phase and that are scored based on errors I activity estimates from regression

and complexity of the hypothesis. The cost of a hypothesis is a quantitative extension of

Occams razor (everything else being equal, the simplest model is preferred;

A detail of the cost of each pharmacophore is computed by the sum of three costs:

weight, error and configuration. While the weight component increases with deviation of

the feature weight from the ideal value of 2.0, the error component increases with RMS

53

difference between the measured and estimated activities. The configuration cost is fixed

and depends on the complexity of the pharmacophore upon completion of this phase.

HipHop and HypoGen use conformations generated in Confirm to align

chemically important functional groups common to the molecules in a study set.

Biological activity data can be incorporated into this hypothesis so that the best

hypothesis for predicting activity are generated and selected. Additionally, you can use

structures that have been aligned in these programs to generate a receptor surface model.

HypoGen Training and Test set selection

Selection of the training set molecules is one of the most important exercises the

user must purpose for the following reasons:

Catalyst derives the information used in subsequent analysis from those structures

thus; “the garbage in garbage out” paradigm certainly applies.

The statistical procedures applied during analysis have limits in terms of over and

under fitting the data.

Data sets that are ideal for those analysis procedures and data sets from typical

medicinal chemistry structure activity series are often not the same thing.

The ideal training set should satisfy the following conditions:

1. At least 16 compounds are necessary to assure statistical power.

2. Activities should span 4 orders of magnitude.

3. Each order of magnitude should be represented by at least 3 compounds.

4. No redundant information.

5. No excluded volume problems.

Methodology

To build a better pharmacophore the following steps were employed

1. Building set of molecules


3. Hypothesis generation

4. Database generation

54

5. Database search

6. Compare / fit to estimate activity

Criteria to generate successful hypothesis are:

1. Cost factor: a dumping score that is the difference between fixed and null cost

should be greater than so hits i.e., larger difference gives better prediction.

2. Fixed cost represents the simplest method model that fits all data perfectly and the

null cost represents the highest cost of a pharmacophore with no features and

which estimates activity to be average of activity data of training set of molecules.

3. The configuration value which is a measure of magnitude of hypothesis space for

a given training set should be less than 18. If it is above, more degree s of

freedom and the result may not be useful.

4. The estimated and the actual activity data correlation value should be around 1.0

5. The RMS deviations, which should be as low as possible, nearly equal to 0, which

represents the quality of the correlation between the estimated and the actual

activity data.

Method

1. Building a set of molecules

All molecules were built using Catalyst view compound work bench. They were

cleaned using option 2D beautify and minimized using CHARMm like force field.


A conformer is a representation model of the possible conformational space of a

ligand. It is assumed that the biologically active conformation of a ligand (or a close

approximation thereof should be contained within this model. Conformers were

generated for all molecules with cut off energy range 20 Kcal /mol and up to a maximum

of 255 conformers.

Cost hypothesis:

55

The lowest cost hypothesis is considered to be the best. However, hypothesis with

costs within 10-15 of the lowest cost hypothesis are also considered as good candidates.

The units of cost are binary bits. Hypothesis costs are calculated according to the number

of bits required to completely describe a hypothesis. Simplex hypothesis require bits for a

complete description and the assumption is made that simplex hypothesis are better.

Hypothesis generation / pharmacophore search

A pharmacophore model consists of a collection of features necessary for the

biological activity of the ligand arranged in 3D space, the common ones being hydrogen

bond acceptor, hydrogen bond donor and hydrophobic features. Hydrogen bond donors

are defined as vectors from the donor atom of the ligand to the corresponding acceptor

atom in the receptor. Hydrogen bond acceptors are analogously defined. Hydrophobic

features are located at the centroids of hydrophobic atoms.

Conformation s for all molecular were generated in view compound work bench

using poling algorithm and the best quality conformer generation method. The best

conformer generation considers the arrangement of atoms. Best conformer generation

accepts a maximum of 255 conformers for the set of molecules Catalyst generated

conformers that provided the most comprehensive treatment of flexible ring systems. All

the conformers are automatically saved and the number of conformers generated for each

molecule with lowest conformer energy in kcal/mol. Conformers were selected that fell

within 20 kcal/mol range above the lowest energy conformation found.

Hypothesis generation

The pharmacophore hypothesis generated in generate hypothesis work bench. The

molecular were selected as training set based on order of magnitude. Hypothesis

generation carried out by employing following assumptions.

1. Highly active and most inactive molecule should represent in the training set.

2. At least 3 or more molecules from each order of magnitude should be selected for

pharmacophore generation.

3. A minimum of 15 or above molecules will constitute for a training set.

56

4. Molecules selected should represent diversity towards chemical features.

Hypothesis considerations

In order to achieve a better pharmacophore, the following limits or considerations

should be met by generated hypothesis:

Configuration value should be around 17.

RMS should be as low as possible, preferable nearer to zero.

Correlation should be around 1.0

Cost factor difference between fixed cost and Null cost should be between 40-80

bits.

Factors that determine the quality of pharmacophore

The overall cost of a hypothesis is calculated by summing three cost factors, a

weight cost, an error cost and a configuration cost. These are qualitatively defined.

1. Weight cost

A value that increases in a Gaussian form as the feature weight in model

deviates from an idealized value of 2.0. This cost factor is designed to favour hypothesis

where the feature Weights are close to 2.

2. Error cost

A value that increases at the RMS difference between estimated and measured

activities for the training set molecules increases. This cost factor is designed to favour

models where the correlation between estimated and measured activities is better.

3. Configuration cost

This is a fixed cost which depends on the complexity of the hypothesis space

being optimized. It is equal to the entropy of the hypothesis space.

57

Of the three, the error cost factor has the major effect in establishing hypothesis

cost. During the beginning phase of an automated hypothesis generation, Catalyst

calculates the cost of two theoretical hypothesis one in which the error cost is minimal

(all compounds fall along a line of slope=10, and one where the error cost is high (all

compounds fall along a line of slope +O). These models can be considered upper and

lower bounds for the training set. The cost values for them are useful guides for

estimating the chances for a successful experiment and are available within 15 minutes

from the start of the run because these experiments can easily require days of run time.

The ideal hypothesis cost (fixed cost) is reported in the full file found in the hypothesis

generation directory. This value tends to be 70-100 bits. The null hypothesis cost is

reported in the log file found in the same directory and is usually higher than the fixed

cost. What is important is the difference between these two costs. The greater the

difference, the higher is the probability for finding useful model. In terms of hypothesis

significance, what really matters is the magnitude of the difference the cost of any

returned hypothesis and the cost of the null hypothesis. In general, if this difference is

greater than 60 bits, there is an excellent chances the model represents a true correlation.

Since, most returned hypothesis will be higher in cost than fixed cost model, a difference

between fixed cost and null cost of 70 or more will be necessary in order to achieve the

60 bit difference. If a returned hypothesis has a cost that differs from the null hypothesis

by 40-60 bits, there is a high probability it has a 75-90% chances of representing a true

correlation in the data. As the difference becomes less than 40 bits, likelihood of the

hypothesis representing a true correlation in the data rapidly drops below 50%%. Under

these conditions, it may be difficult to find a model that can be shown to be predictive. In

the extreme situation where the fixed and null cost differential is small (>20), there is

little chance of succeeding and it is advisable to reconsider the training set before

proceeding. Another useful number is the entropy of hypothesis space. This value is

calculated early in the run and is in full near the value for fixed cost.

Training set

1. Training set should contain the most active compounds.

2. Each compound must provide a unique feature to Catalyst.

58

3. If two compounds have similar structures (collections of features), they must

differ in activity by an order of magnitude to be included, otherwise, pick only the

more active of the two.

4. If two compounds have similar activities (within one order of magnitude), they

must be structurally distinct (from a chemical feature point of view) in order to

both be included, otherwise pick only the most active of the two.

The pharmacophore features are perceived from the HipHop data. The

features present in training set molecules are hydrogen bond acceptor, hydrogen bond

donor, hydrophobic and ring aromatic. 19 molecules are selected for the training set. The

training set molecules and their activity values are loaded into a spread sheet and all the

preferences and uncertainty values are loaded. Then the HypoGen algorithm is used to

generate the hypotheses.

4.4. iii Quantitative Structure Activity Relationship (QSAR)

The idea of quantitative structure-activity (or structure-property) relationships

(QSAR/QSPR) was introduced by Hansch et al. in 1963 and was first applied to analyze

the importance of lipophilicity for biological potency. This concept is based on the

assumption that the difference in the structural properties of molecules, whether

experimentally measured or computed, accounts for the difference in their observed

biological or chemical properties. In general QSAR methods deals with identifying and

describing important structural features of molecules that are relevant to explaining

variation in biological or chemical properties. QSAR started as a simple comparison of

properties for two or more molecules using single number and has ended up as a complex

multivariable treatment of properties versus structure based on statistical analysis and

relying on extraordinary power of modern computers.

QSAR is a technique that quantifies the relationship between structure and

biological data and useful for optimizing the groups those modulate the potency of a

molecule .QSAR has been the useful for rationalizing compound activity and for rational

design of new compounds.

59

Most QSAR methods developed over the years have been dealt with descriptors of

molecular structures derived from 2D representation of molecular structures .i.e., based

on molecular connectivity. Numerous 2D structural descriptors have been reported,

including hydrophobicity constants, molar refractivities, Hammett electronic constants,

Verloop STERIMOL parameters, and topological indices developed by Kier and Hall.

Traditional QSAR methods have utilized several of the above parameters and multiple

regression methods to develop equation relating structure and biological activity

The fundamental quantitative structure activity relationship studies reveals that the

structures can be easily be compared, overlaid and displayed. The QSAR is obtained by

providing more parameters to optimize a series of bioactive molecules. The quantitative

structure activity relationship based on physiochemical properties describes the structural,

electronic and physiochemical characteristics of a drug. Data sets are produced using all

available descriptors.

Application of knowledge of the three-dimensional (3D) structure of the target

(receptor/enzyme/DNA) to rationally design drug molecules to bind to the target is done

for the following reasons are:-

1. Understand atomic details of binding strength and specificity of a drug (drug-receptor

interactions).

2. Develop novel drugs (unique chemical structures) for a selected target via de novo

drug design or database searching techniques.

3. Optimize the therapeutic index of an already available drug or lead compound

concerning structural requirements for activity from a minimum number of compounds

are tested.

A QSAR equation numerically defines the chemical properties, biological activity and

physiochemical properties. Biological activity is defined as pharmacological response

usually expressed in millions such as the effective dose in 50% of the subjects (ED 50).

The lethal dose is 50% of the subjects (LD50) or the minimum inhibitory concentration

IC50. It is common to express the biological activity as a reciprocal QSAR equation is

similar to the equation for a straight line:-

y = mx + c

or

60

Log biological activity = a (physiochemical property) + c

a = regression coefficient of slope of the straight line.

c = intercept on y-axis (when the physiochemical property equals zero)

Fig 18: Concept of QSAR

Biological activity expressed as a reciprocal to produce a positive slope and

also due to the inverse relationship between physiochemical chemical property and

biological potency. There is a positive relationship between the reciprocal of the

biological activity(I/BA) and physiochemical property, because (I/BA) increases as the

studies are based on the descriptors and biological activity relationship the biological

activity data must be minimal and the choice of the descriptors of the descriptors must be

accurate and appropriate.

Objective of QSAR:

1. Drug transport/ mechanism

2. Prediction of activity.

3. Classification of molecules as highly active, moderately active and inactive.

4. Optimization of activity by steric, electrostatic and hydrophobicity

61

5. Refinement of synthetic targets.

6. Reduction and replacement of animals for the action of drugs

Basic requirement in QSAR studies:

1. All analogues should belong to congeneric series.

2. All analogues should exert same mechanisms of actions.

3. All analogue should bind in a comparable manner.

4. Effect of isosteric replacement can be predicted.

5. Binding affinity can be correlated to interaction energies.

6. Biological activities can be correlated to binding activity.

QSAR studies involve the following steps

CSD data base.

Choice of descriptors.

Statistical methods to evaluate to evolve QSAR equation.

Validation.

CSD database

Experimental information about the structures of molecules can often be

extremely useful for forming theories of conformational analysis and hoping to predict

the structures of molecules for which no experimental information is available. The most

important technique currently available for determining the three dimensional structure of

molecules is x-ray crystallography community has distributed in electronic form two

practically important databases for molecular modeler are the Cambridge structural

database CSD which contains crystal structures of organic and organ metallic molecules

and the protein data bank (PDB) which contain structures of proteins and some DNA

fragments.

62

A data base of little use without software tools to search extract and manipulate the

data. A simple use of a database is for extracting information about a particular molecule

or group of molecules .the data may also be identified by creating a two dimensional

representation of molecule and using a substructure search program to search the

database. Crystallographic database have also been used to develop an understanding of

the factors that influence the conformations of the molecules, and of the ways in which

molecules interact with each other. For example, the CSD has comprehensively analyzed

to characterize how the lengths of chemical bonded depend upon the atomic numbers,

hybridization and the environment of the atoms involved. Analyzing of intermolecular

hydrogen bonding have revealed distinct distance and angular preferences a major use of

the CSD is substructure searching for molecules which contain a particular fragment, in

order to investigate the conformation that the fragment adopts.

A crystallographic database can only provide information about the crystal state

of matter and that the possible influence of crystal packing forces should always be taken

into account. This is less of concern for protein than for small molecules as protein

crystals contain a large amount of water and indeed NMR studies are established that

protein have approximately, the same structure in solution as in the crystal.

A second, more stable subtle, bias is that crystallographic databases only contain

molecules that can be crystallized and indeed only those molecules whose X-ray

structures were considered enough to be published. The structures in a crystallographic

database may therefore not be a wholly representative set.

Molecular descriptors

The study of steric requirements for interaction between ligands and

corresponding biological acceptor sites is often of decisive importance in understanding

the role played by the structural features in promoting activity in its most general form

drug receptor theory requires that a ligand exerts its biological action as a consequence of

binding or otherwise interacting with a specific biological acceptor site such as

membrane protein , an enzyme etc., which may be generally termed the receptor the

concept is the basis for modern drug receptor theory involves the old principle that a

ligand fits its receptor much as a key fits a lock. This concept, although somewhat

63

arbitrary since a high degree of flexibility is present in biomacromolecules, structure,

governs the principle of molecular recognition and molecular discrimination. Although

stereochemistry often plays a major role in drug bioactive, care must be taken when

considering structure activity relationship to explore whether other differences in

physiochemical properties exists before one makes significant correlations with the steric

properties of the structure under study.

In early studies organic chemists defined a number of steric parameters in

order to explain steric effects of substituents on the reaction centers of organic molecules.

The same type of steric effects observe in studies of variation of physical properties and

the chemical reactivity with structure may be assumed to be involved in biological

activity studies which at least as a first approximation may be treated in similar fashion in

the past 35 years owing to the development of drug design and Hansch Approach many

other parameters and methods have been developed which have the permit of trying to

avoid a simple empirical correlation with given ligand properties and also trying to

propose the possible geometric features of the receptor.

Steric descriptors are classified into following groups:

1. Topological indices based on characterization of the chemical structures of the graph

theory.

2. Geometric descriptors resulting from the view of organic molecules as three

dimensional objects from which standard dimensions can be calculated.

3. Chemical descriptors derived from steric influence upon a standard reaction.

4. Physical descriptors derived when an organic molecule is considered as three

dimensional object with size determined physical properties and different descriptors

which result when an organic molecule is considered as a three dimensional object from

reference structure.

Different molecular descriptors available are described below.

Molecular Descriptors

1. Fragment constant descriptors

64

These are constants that relate the effect of substituents on a “reaction center”

from one type of process to another. The basic idea is that similar changes in

structure are likely to produce similar changes in reactivity, ionization or

binding. There are different constants corresponding to different effects. These

are typically used to parameterize the Hammett equation for some series of

analogs.

Log kx= pσ +log kh

Where Kx and kh are reaction rate constants for the substituents x and h,

respectively ;0 is an electronic constant by an ionization constant and p is fit to

set etc at different properties (electronic , steric )etc at different R group

positions are used . In this way measurements of ionization constants can be

used to predict rate constants once a sealing factor (p) is determined effects for

the rate of constant. The default database currently contains the following types

of constants. These come from table VI –I of Hansch except for the Sterimol

constant which is calculated.

Sm, Sp - Electronic effect sigma meta and sigma para

F, R - Inductive polar part (F) and resonance part (R)

pi – Hydrophobic character

HA, HB – Hydrogen bond acceptor (HA) and donor (HB)

MR - Molar refractivity = (n2-1/n2+1)*(MW/d)

[n -refractive index, MW -molecular weight and d -compound density]

Sterimol-L – Steric length parameter

Sterimol-B1 through B4 – Steric distances perpendicular to bond axis

Sterimol-BS – Overall maximum steric distance perpendicular to bond

axis

2. Conformational descriptors

65

Energy – Descriptor energy is the energy of the selected conformation

Low Energy – Energy of the most stable conformation in the set of

conformations belonging to each molecular model

E penalty – Difference between Energy and Low Energy

3. Electronic descriptors

Charge – Sum of partial charges

F charge – Sum of formal charges

A pol – Sum of atomic polarizabilities

Dipole – Dipole moment

HOMO – Highest occupied molecular orbital energy

LUMO – Lowest unoccupied molecular orbital energy

Sr – Super delocalizability

4. Graph theoretic descriptors

All these descriptors ultimately base their calculation on representation of

molecular structures as graphs, where atoms are represented by vertices and

covalent chemical bonds by edges. These descriptors fall into 2 categories:

a.) Topological descriptors: These view molecule graphs as connectivity

structures to which numerical invariants can be assigned. There are 20

descriptors based on graph theory concept. They help to differentiate

molecules according mostly to their size, degree of branching, flexibility and

overall shape. Examples are Weiner’s index, Zagreb Index, Hosoya index,

Kier and Hall molecular connectivity index and Balaban indices.

66

b.) Information content descriptors: These view molecule graphs as source of

certain probability distribution to which Shannon’s statistical information

theory tool can be applied. In this approach molecules are viewed as

structures which can be partitioned into subsets of elements that are in some

sense equivalent. The notion of equivalence depends on the particular

descriptor.

All of these descriptors perform their evaluations on Hydrogen suppressed

graphs, i.e, there are no vertices corresponding to hydrogens and no edges

corresponding to bonds connecting hydrogen to another atom.

5. Molecular Shape Analysis (MSA) descriptor

DIFFV – Difference volume

Fo – Common overlap volume (ratio)

NCOSV - Non common overlap steric volume

Shape RMS – RMS to shape reference

COSV – Common overlap steric volume

SRVol – Volume of shape reference compound

6. Spatial descriptors

RadofGyration – Radius of gyration

Jurs descriptors – Jurs charged partial surface area descriptors

Shadow indices – Surface area projections

Area – Molecular surface area

Density – Density

67

PMI – Principle Moment of Inertia

Vm – Molecular volume

7. Structural descriptors

MW – Molecular weight

Rotlbonds – Number of rotatable bonds

Hbond acceptors – Number of Hydrogen bond acceptors

Hbond donor - Number of Hydrogen bond donors

8. Thermodynamic descriptors

AlogP – Log of partition coefficient

Fh2o – Desolvation free energy of water

Foct - Desolvation free energy for octanol

Hf – Heat of formation

MolRef – Molar refractivity

9. Molecular Field Analysis (MFA) descriptors:

Molecular field analysis (MFA) evaluates the energy between a probe and

molecular model at a series of points defined by a rectangular or spherical grid. This

method quantifies the interaction energy between a probe molecule and a set of aligned

target molecules in QSAR. This energy may be added to the study table to form new

columns headed according to the probe type. The new columns may be used as

independent X variables in the generation of QSAR.

Six descriptors are available in this family.

68

1. H+ probe: This selects proton “as a probe’, having +1 charge and zero vanderwaals

radius. It has electrostatic interactions and non bonded interaction are not

considered

2. CH3 probe: This probe with a vanderwaals radius of united CH3 group but with a

zero charge. The energy of interaction of this probe with a study molecule will

include only non bonded interactions.

3. Donor / acceptor probe: It is two atom probes consisting of oxygen bounded to

hydrogen. The vanderwaals radii of eth atoms are exactly how they are defined in

the particular force field loaded. The probe is neutral. Depending on the

orientation of this probe. It is capable of bleaching as a hydrogen bond donor or

an acceptor.

4. CH3 probe: It is single atom probe with a vanderwaals radius of a united CH3 of -

1. The energy of interaction of this probe includes both non-bonded of interaction

of this probe includes both non bounded and electrostatic interactions.

5. Generic probe: There is a generic single atom probe with a user specified Vander

radius and charge.

6. Other probes: Any multi atom model may be employed as a probe specifying the

Msi file format.

Statistical methods to evaluate QSAR equation

QSAR analysis uses statistical methods for studying the correlation of biological

activity to structural and physiochemical properties of candidate molecules. Here are

different statistical techniques used to fit the molecule under multivariate statistics, which

include the following:-

1. PCA (Principal Component Analysis):

It aims at representing large amount of multidimensional data by

transforming them into a more intuitive low dimensional representation. This

69

method does not create a model, but searches for relationship among the

independent variables. It then creates new variables (the principal components)

which represent most of the information contained in the independent variables.

2. Cluster Analysis:

The goal of cluster analysis is to partition (typically to representing set of

models in a molecular descriptor property space) into classes or categories

consisting of elements of comparable similarity. The algorithm assumes that

models are represented by points in multidimensional property space with

Euclidian distance between points representing model dissimilarity. The below

mentioned are the types in this category

1. Jarivs – Patrick clustering

2. Variable-Length Jarnis Patrick clustering

3. Relocation Clustering

4. Hierarchical Clustering Analysis (HCA)

3. Simple Linear Regression:

It performs a standard linear regression calculation to generate a set of

QSAR equations that includes one equation for each independent variable. It is

good for exploring simple relations between structure and activity.

4. Multiple Linear Regressions (MLR):

This method calculates QSAR equation by performing standard multi

variable regression calculations using multiple variables in a single equation. In this

method variables are independent correlated).

5. Stepwise Multiple Linear Regression:

It calculates QSAR equation s by adding one variable data time and

testing each addition for significance and such variables are sued in QSAR

70

equation. It is useful when the number of variables is large and when the key

descriptors are not known. If the number of variables exceeds number of structures

this method should not be used.

6. PLS (Partial Least Squares):

This method carries out regression using latent variables. From the

independent and dependent data that are along their axes of greatest variation and

are most highly correlated. It can be used with more than one dependent variable.

It is typically applied when the independent variables are correlated or the number

of independent variables exceeds the number of observations (rows).

7. GFA (Genetic Function Approximation):

GFA is designed to be applied to the problems of function

approximation. When it receives a large number of potential factors influencing a

response including several powers and other functions of the raw inputs, it should

find the subsets of terms that correlate best with the response.

The central concepts of GFA are simple. The region to be searched is coded into

one or more strings. In the GFA these strings are sets of terms: power and splines

of the raw input. Each string represents a location in the search space.The

algorithm works with a set of these strings called a population. This population is

evolved in manner that leads it towards the objective of research. This requires

that a measure of the fitness of each string corresponding to a model in the GFA is

available.

Following this three operations are performed iteratively in succession: selection,

crossover, mutation. Newly added members are screened according to fitness

criteria. In GFA the scoring criteria for models are related to the quality of the

regression fit to the data. The selection probabilities must be revaluated each time

when a new member is added to the population.

1. Selection: Two parents are selected from the present population with

probabilities proportional to their fitness.

71

2. Crossover: A crossover splices and rejoins the characters in the two parent

string to create a new child string. In conventional genetic algorithm this is

accomplished by selecting the crossover point along each of the parents and

combining the first substring from the first parent from the second substring with

the second parent.

Parents: Child:

X 12 , X 2 | 3 X 4, X 33 X 12 , X 2 , X 4 , X 52

X 1 , X3 | X 4, X 52

3. Mutations: In a mutation, the single term in a string (a model) is altered.

This is the mechanism for continuously introducing a measure of diversity into

the population acting to prevent the algorithm from getting stuck with in a

suboptimal of solutions.

In the GFA algorithm simulations are performed with the user defined probability

after each crossover. The GFA procedure continues for a specified number of

generations unless convergence occurs in the interim. Generation is the number of

attempted a crossover equal to the size of population. Convergence is triggered by

lack of progress in the highest and average score of populations.

8. GPLS: (Genetic Partial Least Squares):

It is a method derived from GFA and PLS that are valuable analytical

tools for datasets that have more descriptors than samples. The following three

statistical methods are useful in combi chem. and analog builder.

9. FA (Factor Analysis):

It addresses one of the main problems found in PCA that is not simple to

relate the principal component to molecular properties. All the common factors

have a close relationship to real molecular properties.

72

10. RP (Recursive Partition):

It identifies the internal representation of classes used by classification

structure activity relations hip (CSAR) for deriving recursive portioning models.

Validation Methods

Once a regression equation is obtained it is important to determine its

reliability and its significance. Internal validation uses the data set for which the model is

derived and checks for internal consistency. The procedure derives a new model and is

used to predict the activities of the molecules that were not included in the new model

set. This is repeated until all compounds have been deleted and predicted once. Internal

validation is less rigorous than external validation. External validation evaluates how well

the equation generalization. The original data are divided into two groups, the training set

and the test set. The training set is used to derive a model, and the model is used to

predict the activities of the test set numbers. The following procedures are used to check

that the size of the model is appropriate for the quantity of data availability as well as

provides some estimate of how well the model can predict activity for new models are as

follows:-

1. Cross Validation: This process repeats the regression may times on subsets of the data.

Usually each molecule is left out intern and r2 is computed using the predicted values of

the missing molecules (r2)

2. Randomization Test: Even with large number of observations and a small number of

terms, an equation can still have a very poor predictive power. This can come about it the

observation are not sufficiently independent of each other.

Interpreting QSAR equation

QSAR is used for predicting the activities of as yet untested and possibly not yet

synthesized) molecules. The predictive ability of a QSAR is generally more accurate for

73

interpolative (for compound that have parameters within the range of those considered in

the data set) than for the extrapolative predictions (compounds that are outside the range)

A QSAR equation provides insights into the mechanism of the process being studies.

1. Square Of Correlation Coefficient (r2): If x (independent) and y (dependent) variables

are highly correlated, there is considerable information in x and y that is redundant. The

degree of correlation is measured by the correlation coefficient (r2)

2. Cross Validated r2 (Termed As Q2 or Xvr2): r2can be computed using cross validation

methods (XVr2) or boot strap methods (BSr2). It is also the fraction of the variance

explained by the model. Cross validated r2 is always somewhat lower and often much

lower than the r2.

3. PRESS (Predictive Error Sum Of Squares): The sum of overall compares of the

squared differences between the actual and the predicted values for independent variables

[1/y]2. The intensity of the cross validated process is controlled by selecting the number

of groups or number of times the cross validation step is to be carried out while

predicting all rows (at each stage of model development).

Procedure

74

Fig 19: Flowchart of QSAR procedure

Calculate molecular properties

The Calculate Molecular Properties protocol will calculate many properties or

perform basic statistical and correlation analysis of the numeric properties as requested.

To set up a Calculate Molecular Properties protocol:

1. Load the QSAR and apply the force field on molecules and Calculate

Molecular Properties protocol from the Protocols Explorer. The parameters

display in the Parameters Explorer.

2. On the Parameters Explorer, click in the cell for the Input Ligands parameter

and click the button to specify the ligand source on the Specify Ligands dialog.

On the dialog, select all ligands from a Table Browser, a 3D Window, or a file.

3. Select the properties to calculate by clicking the button in a cell for the

Molecular Properties, Semi empirical QM descriptors, or Density Functional QM

descriptors, and follow the instructions in the popup dialog window.

75

The Create Genetic Function Approximation can build a Genetic Function

Approximation model for a dependent property using the selected molecular descriptors.

To set up a Create Genetic Function Approximation Model protocol:

1. Load the QSAR /Create genetic function approximation Model protocol from

the Protocols Explorer. The parameters display in the Parameters Explorer.

2. On the Parameters Explorer, click in the cell for the Input Ligands parameter

and click the button to specify the ligand source on the Specify Ligands dialog.

On the dialog, select all ligands from a Table Browser, a 3D Window, or a file.

3. Set the desired model name using the Model Name parameter. Once created,

this model will appear under the other category of the Molecular Properties

parameter in the Calculate Molecular Properties protocol and can be used to

compute the property for future ligands.

4. Set the initial equation length and remaining parameters as desired. Parameters

presented in red are required.

76

5.1. LIGAND FIT

The docking score is the negative values of the non-bonded inter molecular energy, if the

ligand atom has partial charge on it, the electrostatic grid is used to estimate electrostatic

energy. If it is a hydrogen atom, the hydrogen grid is used for Vander Waals energy.

Fig1: This figure is showing the binding site of the protein, which is defined for the

ligand fit.

78

Fig2: Molecule scafold4 molecule1 (high active) which has been subjected to ligand

fit is showing its interaction with amino acids of 2ZDZ.

Fig3: Molecule 2 (low active) which has been subjected to ligand fit is showing its

interaction with amino acids of 2ZDZ.

79

Table showing top 10 Dock scores of high active molecule

Index Name DOCK_SCORE(HA)1 Scafold4 molecule1 82.6342 Scafold4 molecule1 80.0593 Scafold4 molecule1 78.7324 Scafold4 molecule1 75.6295 Scafold4 molecule1 75.2596 Scafold4 molecule1 74.997 Scafold4 molecule1 72.3268 Scafold4 molecule1 72.1679 Scafold4 molecule1 72.01210 Scafold4 molecule1 71.776

Table showing top 10 Dock scores of low active molecule

Index Name DOCK_SCORE(LA)1 Molecule 2 67.262 Molecule 2 67.1333 Molecule 2 66.6834 Molecule 2 66.015 Molecule 2 65.6346 Molecule 2 65.0957 Molecule 2 64.6658 Molecule 2 64.459 Molecule 2 64.27210 Molecule 2 64.267

CONCLUSION:

The docking score of the above stated molecules are all positive values. Thus the

molecules can be used as the potential ligands for the inhibition of betasecretase.

80

CDOCKER:

Uses CHARMm based molecular dynamics to dock ligands into a receptor

binding site. Random ligand conformations are generated using high temperature

molecular dynamics. The conformations are then translated into binding site. Candidate

poses are then created using random rigid body rotation followed by simulation

annealing. A final minimization is then used to refine the ligand poses.

Fig5: Molecule Scafold4 molecule 1 (high active) which has been subjected to

cdocker is showing its interaction with amino acids of 2ZDZ.

81

Fig6: Molecule 5 (low active) which has been subjected to cdocker is showing its

interaction with amino acids of 2zdz.

Table showing top 10 CDocker energies of high active molecule

Index Name CDOCKER_ENERGY(HA)1 Scafold4 molecule1 32.3612 Scafold4 molecule1 29.2993 Scafold4 molecule1 27.3284 Scafold4 molecule1 27.1225 Scafold4 molecule1 26.9726 Scafold4 molecule1 26.4947 Scafold4 molecule1 25.6818 Scafold4 molecule1 25.5469 Scafold4 molecule1 25.28410 Scafold4 molecule1 25.257

82

Table showing top 10 CDocker energies of low active molecule

Index Name CDOCKER_ENERGY(LA)1 Molecule 5 -14.5852 Molecule 5 -15.3823 Molecule 5 -17.5744 Molecule 5 -17.7195 Molecule 5 -18.2036 Molecule 5 -19.4077 Molecule 5 -19.7038 Molecule 5 -19.8009 Molecule 5 -19.88210 Molecule 5 -20.370

CONCLUSION:

The docking energies of the ligands were estimated by using CDOCKER protocol.

LUDI:

Fig7: The figure is representing the Interaction map generated using Ludi program.

83

Fig8: The above picture is showing the Denovo ligand generated in Ludi program.

Fig9: The above picture is showing the Denovo ligand occupied in the interaction

map generated in Ludi program.

84

Fig10: Denovo ligand generated is showing the interactions with Gly96 and Ser291

amino acids of protein 2zdz.

CONCLUSION:

The newly designed ludi molecules were found to satisfy the interaction sites for

the active site of the protein 2zdz.

LIB DOCK:

Uses CHARMm based molecular dynamics to dock ligands into a receptor

binding site. Random ligand conformations are generated.Lib dock uses the physico

chemical properties of the ligands to guide docking to corresponding features in the

protein binding sites by matching a triplet of ligand atoms to a triplet of protein hot spots.

85

Fig 11: Molecule Scafold4 molecule1(high active) which has been subjected to Lib

Dock is showing its interaction with amino acids of 2zdz.

86

Fig 12: Molecule 5 (low active) which has been subjected to Lib Dock is showing its

interaction with amino acids of 2zdz.

CONCLUSION:

Lib dock studies prove that the compound Scafold4 molecule1 have the libdock

energy 86.666.Molecule 5 (low active) has lib dock energy 110.87.

STRUCTURE BASED PHARMACOPHORE:

Structure based pharmacophore approach was to find an out the essential feature

of active site which can contribute for ligand binding.

Interaction generation:

Enumerates pharmacophore features from a protein active site. The site finding

algorithm from Ludi to identify points in the active site that could interact with the

receptor. Creates a pharmacophore query containing Hydrogen bond acceptor, donor and

hydrophobic features from these points

87

After interaction generation run, it Found 484features :minimized 2zdz

Found 112lipophilic features

Found 162 H-acceptor features

Found 210 H-donor features

Figure 13: Cluster feature of interaction generation.

88

Figure 14: Center points of cluster feature

Figure15: Mapping of active site amino acids with Structure Based

Pharmacophore Feature.

89

This structure based pharmacophore features are useful for virtual screening of large

database.

6. ANALOG BASED DRUG DESIGNING

The work in discovery studio depicts how chemical features hydrogen acceptor,

hydrogen donor, hydrophobic aliphatic of set of compounds along with their activities

ranging over several orders of magnitude can be used to generate pharmacophore

hypothesis that can successfully predict the activity. The models were not only predictive

within the same series of compounds but different classes of diverse compounds were

also effectively mapped onto most of the features important for activity. The

pharmacophore generated can be used for discovery of diversified structures that can be

potential lethal factor inhibitors, and to evaluate how well any novel compound maps on

to the pharmacophore developed during the study, using inhibitors against lethal factor

possessing distinct features which may be responsible for the activity of the inhibitors.

Analogue Based Pharmacophore Generation:

i. Common Feature Pharmacophore Generation (HIP HOP):

The 10 most active molecules were used to derive common feature based alignments.

All the 10 most active molecules were considered as reference molecules to get the best

features. The best features obtained from hip-hop run method are

1. Hydrogen bond acceptor, 2. Hydrogen bond acceptor lipid

3. Hydrogen bond donor 4. Hydrophobic

5. Ring aromatic

Table showing Summary of feature definition hits by molecule

Molecule A D H Z Y N X P W R

Molecule_1 18.07 7.15 3.79 0.79 3.00 0.00 0.00 0.00 2.00 8.00

A-hydrogen bond acceptor: H-hydrogen bond acceptor lipid: D-hydrogen bond donor:

z-hydrophobic; Y-hydrophobic aliphatic: X-hydrophobic aromatic:

90

N-negative Ionizable; P-positive with Exclusions ; W- Positive Ionizable;

R-ring aromatic.

Table showing Common Feature Pharmacophore Generation Rank File

Hypo.

No

Pharmacophore

Feature

Rank score Direct hit Partial hit Max fit

1 YDAA 14.643 1 0 4

2 YDAA 14.643 1 0 4

3 YDAA 14.627 1 0 4

4 YDAA 14.563 1 0 4

5 YDAA 14.563 1 0 4

6 YDAA 14.563 1 0 4

7 YDAA 14.563 1 0 4

8 YDAA 14.561 1 0 4

9 YDAA 14.561 1 0 4

10 YDAA 14.561 1 0 4

91

ii. HYPOGEN (Training set):

Sets of 10 hypotheses were generated using the data from 25 training set

compounds. Different cost values correlation coefficient RMS deviations and

pharmacophore features are listed in table.

The best pharmacophore is taken as the hypothesis 1 which has the highest cost

difference, lowest error cost, lowest RMS difference and the best correlation coefficient

has two hydrogen bond acceptors, one hydrophobic and one hydrogen bond donor

features. The best pharmacophore (hypo1) has the highest cost difference of 53.410, the

best correlation coefficient and RMS difference. For the highly active compound

pentagon carbon of pyrrole and another feature mapped to oxygen of side chain of the

pyrrole.The HBD feature mapped to one of the nitrogen of trinitro carbon. The HBA

feature is mapped to oxygen of the centroid.

Table showing 5 pharmacophore models generated by the hip-hop algorithm

Hypothesis Total Cost Difference RMS Correlation Features1 76.334 53.409 1.0798 0.936 YDAA2 77.126 53.617 1.1201 0.9313 YDAA3 79.650 50.093 1.2381 0.9154 YDAA4 81.281 48.462 1.3080 0.9051 YDAA5 81.875 47.868 1.3444 0.8994 YDAA6 84.156 45.587 1.4453 0.8826 YDAA7 85.321 44.222 1.4922 0.8743 YDAA8 86.179 43.564 1.5260 0.8682 YDAA9 87.308 42.435 1.5677 0.8603 YDAA10 88.177 41.566 1.6007 0.8538 YDAA

Note: Difference= Null cost – Total cost

Null cost=129.743

RMS=3.07536

Features:

Y= Hydrophobic aliphatic D=Hydrogen bond donar

A=Hydrogen bond acceptor

92

.

Figure 16: Showing the distances between Pharmacophore Features

93

Figure 17: Overlapping of highest active inhibitor molecules of training set with the

best pharmacophore .

Figure 18: Overlapping of lowest active inhibitor molecule of training set with the

best pharmacophore

94

Table showing Results of pharmacophore hypothesis generated using test set.

Name-P(tc) Activ-P(ts) -uM Estimate -P(ts) uM Fit value -P(ts)

Scafold3 molecule13 0.56 0.2195 7.1448Scafold4 molecule6 0.12 0.2575 7.0755Scafold4 molecule13 0.75 0.2833 7.0341Scafold3 molecule12 0.67 0.3044 7.0028Scafold4 molecule11 0.27 0.4449 6.8380Scafold3 molecule13 0.28 0.5833 6.7204Scafold4 molecule4 0.28 0.6382 6.6814Scafold4 molecule5 0.17 1.2128 6.4026Scafold4 molecule16 0.65 2.1886 6.1462Scafold4 molecule8 0.14 3.7547 5.9118Scafold3 molecule3 0.68 6.9683 5.6432Molecule 1 28.6 7.8400 5.5920Scafold4 molecule14 0.56 12.387 5.3934Scafold6 molecule5 3.23 14.790 5.3164Scafold6 molecule9 2.68 28.131 5.0372Scafold4 molecule12 0.24 61.707 4.6960Scafold3 molecule16 0.57 64.150 4.6792Scafold6 molecule7 2.39 77.852 4.5951Scafold5 molecule3 1.34 87.245 4.5456Scafold6 molecule6 0.22 96.207 4.5Scafold3 molecule1 0.26 113.68 4.4307Scafold3 molecule9 1.71 170.29 4.2552Scafold5 molecule4 1.54 215.63 4.1526Scafold5 molecule1 1.5 242.95 4.1008Scafold5 molecule5 0.46 257.70 4.0752Scafold6 molecule4 0.77 345.57 3.9478Scafold6 molecule3 0.57 455.23 3.8281Scafold5 molecule2 0.62 21220 2.1596Molecule 11 2.8 24666 2.0942Scafold3 molecule10 0.28 24782 2.0922Molecule 5 100 45076 1.8324

Discussion

Pharmacophore models of BASE1 lethal factor inhibitors are generated in

HypoGen module in DS software. HypoGen attempts to construct the simplest

hypotheses that best correlates the activities (experimental vs. predicted).

95

The dataset was divided into training set (16 compounds) and test set (31

compounds,), considering both structural diversity and wide coverage of the activity

range. The compounds with activity with < 1 uM were considered as highly actives (++

+), compounds with an activity range of 1-100 uM as moderate actives (++) and activity

of >100 uM as least actives (+).At end of run, HypoGen generated 5 pharmacophore

models. The Null cost for ten hypotheses was 129.743, the fixed cost of the run was

76.333 and the configuration cost was 15.9021. A difference of 53.410 bits obtained

between fixed and null costs is a sign of highly predictive nature of hypotheses. All 10

hypotheses generated showed high correlation coefficient between experimental and

predicted IC50 values. It indicates that all the hypotheses are having true correlation

between 80-95%. The cost values, correlation coefficients (r), RMSD, and

pharmacophore features are listed in Table12.The best pharmacophore (Hypothesis 1)

consisted of two H-bond acceptor (HBA), an H-bond donar (D), and a hydrophob

aromatic(Y) feature with a correlation coefficient (r) of 0.9363, total cost (76.3334), and

lowest RMSD value (1.07988) was chosen to further validate its predictive power by

estimating the activity of test set.

96

Graph showing Point plot representation of test set

QSAR:

In the present study quantitative structure activity relationship studies were

carried out on BASE 1 inhibitors in order to design selective and potential inhibitors.

QSAR models were developed using1D and 2D-descriptors using discovery studio

software. QSAR attempts to model the activity of a series of compounds using measured

or computed properties of the compounds. In the equation the term ‘N’ means the number

of data points, r2 which is the square of the correlation coefficient which describing the

binding of the compounds to the QSAR model. XV r2, a squared correlation coefficient

generated during a validation procedure using the equation

XV r2 = (SD PRESS)/SD

SD means the sum of squared deviations of the dependent variable values

from their mean the predicted sum of squares (PRESS), the sum of overall compounds of

the squared differences between the actual and the predicted values for the dependent

variables. The PRESS value is computed during a validation procedure for the entire

97

training set. The larger the PRESS value the more reliable is the equation. XV r2 is

usually smaller than the overall r2 for a QSAR equation. It is used as a diagnostic tool to

evaluate the predicted power of an equation generated using the multiple leaner

regression method.

GFA work by generating random populations of solution to a problem,

scoring the relative quality of the solution , and caring forward the most fit solutions or

analogues(generated through mutation and crossover)of other solutions to iteratively

generated(and finally converge on)new, more fit solution. In this study GFA analysis was

done with following parameters.

Population size

Initial equation length

Final equation length

Number of generation

Boot strap r2 correlation coefficient calculated during the validation procedure.

30 compounds were included in the training set to generate the primitive QSAR model

covering the widest data range of IC50 values 0.078 to 118 uM. The predictive characters

of QSAR were further assessed using test molecules. To judge the predictive ability of

the QSAR model for new drug candidates the IC50 values for the test and training set

were evaluated.

GFA parameters

Number of rows in model 30

Population 40

98

Maximum generation 50000

Initial terms per equation 20

Scoring function

Friedman

LOF

Mutation probability 0.1

Table showing GFA Prameters

The GFA method performs a search over the space of possible QSAR models using

lack of fit (LOF) scores to estimate the fitness of each model. These models lead to the

discovery of predictive QSAR equations.

qtr2_1 =

8.5055-1.5055 * Count<ECFP_6:672362763> − 1.5755 *

Count<ECFP_6:65758642 > + 3.9675* Count<ECFP_6:12965448167>

+0.20089*Count<ECFP_6:18844118037>+2.667Count<ECFP_6:1.5633445

59> O_Count

From the above equation, the positive values are the reference for the presence of

specific group at that point and increase the activity of molecule and the negative values

indicate the presence of ionic group which reduce the activity.

Table showing the validation statistics for the model.

Friedman LOF 0.03102

R-squared 0.9771

adjusted R-squared 0.9719

r2(predicted) -3.3864

99

RMS Residual Error 0.132

significance of regression P value 2.842e-17

Friedman L.O.F. is the Friedman lack-of-fit score;

S.O.R. p-value is the p-value for significance of regression

Table showing the Experimental and predicted values of Training set compounds using GFA

Name QSAR(Tr) Exp pIC50 QSAR(Tr) GFAT Model_1(Tr)

Prediction

error(Tr)

Scafold3 molecule1 9.59 9.35007 0.298943





















100

Molecule 2 6.93 6.93 0.403709

Molecule 6 8.52 8.4972 0.320761

Molecule 7 7 7 0.403709

Molecule 11 8.55 8.49723 0.320761






Graph Showing correlation between experimental and predicted activities by QSAR

equation using GFA method

101

Test Set

The purpose of QSAR is not only to produce the biological activity of the

training set but also to predict the values of the test set molecules. From the above

equation obtained for the training set molecules of known activity are introduced to study

table so as to predict the biological activity. A series of molecules were introduced to

study table which are known as test set molecules. After the prediction of activities of test

set molecules the activity of prediction crosses over 80%.

Table showing Experimental and predicted values of Test set compounds using GFA

Name QSAR(Ts) Exp pIC50QSAR(Ts)

GFAT

Model_1(Ts) Prediction error(Ts)

Molecule 8 7.05 7 0.387026

Molecule12 7.99 8.50547 0.307614









102

Graph Showing correlation between experimental and predicted activities by QSAR

equation using GFA method for test set.

The result generated from QSAR equation using GFA method, the values

observed for r2 and XV r2 are in specific range and there is a good correlation between

experimental and GFA predicted activity as listed. Good correlation is observed between

the experimental IC50 and computational predicted IC50 values. It has been suggested as

since the predictive ability of equations is good, they can be used to develop new analogs.

103

7. CONCLUSION:

As far as Insilco studies are concerned for beta secretase1 (BASE1) the algorithms

such as QSAR, Pharmacophore and docking were used. These algorithms showed good

results.

The 3D QSAR studies conducted for training set compound gave a good r2 score of

0.9771 with four outliers with a GFA graph with a Fit line representing the good

correlation of the compounds with the activities. The pharmacophore studies gave the

best quantitative pharmacophore model in terms of predictive value consisted of three

features like Hydrogen bond acceptor, Hydrogen bond donar, Hydrophobic aromatic.

Hypogen which is further validated by using a set of BASE1 inhibitors gave a correlation

value of 0.9363. The Pharmacophore studies showed three regions which showed

interactions i.e., hydrogen bond acceptor, Hydrophobic aromatic , hydrogen bond donor.

The Insilco modeling helped to guide the lead optimization and lead to the generation

of a highly potent series of BASE1 inhibitors with good drug like properties and is

subject of another communication. However, the scope for fine tuning and optimizing

this potent class of BASE1 inhibitor could lead to the generation of new therapeutic

agents.

The combined approach of analogue and structure based drug designing methods

allowed us to gain an insight into predicting the enhanced activity and exploring the

docking interactions between amino acid residues of lethal factor and the ligand. Good

ligands may not act as good drugs. Thus, the prime objective of this project to prove the

authenticity of our techniques obtained from the various journals is completed using

computer aided drug designing. The results obtained are used to develop new ligand

molecules and find their activities Insilico and proving the same in accordance with the

experimental values. Thus, the results reported can successfully employ in the rational

drug designing of novel and potent lethal factor inhibitors.

105

8. ABBREVIATIONS

BASE beta-site of APP-cleaving enzyme

GLY Glycine

HIS Histidine

LYS Lysine

MET Methionine

ASN Aspergine

CADD Computer Aided drug design

CNS Central Nervous system

HDL High density lipids

ASP Aspartic acid

LF Ligand fit

CHARMm Chemistry at Harvard macromolecular mechanics

QM Quantum mechanics

HYPO Hypothesis

MD Molecular dynamics

SD FILE Structural data file

uM Micro molar

NM Nano molar

% Percent

IC50 Half maximal inhibitory concentration

R² Regression co-efficient

XVR2 Cross validated regression co-efficient

PRESS Predicted residual error sum squares

LOF Lake of fit

CSD Cambridge structure data base

MLR Multiple linear regression

HBD Hydrogen bond donor

HBA Hydrogen bond acceptor

106

HY Hydrophobic

PDB Protein data bank

SBDD Structure based drug designing

ABGD Analog based drug designing

RMS Root mean square

HTS High throughput screening

DNA Deoxyribonucleic acid

NMR Nuclear magnetic resonance

QSAR Quantitative structure activity relationship

SAR Structure activity relationship

ADMET Adsorption distribution metabolism excretion toxicity

Table showing Legends used

107

9. REFERENCES

1. A good book over all, and chapter 7 in particular, is

G. L. Patrick "An Introduction to Medicinal Chemistry" Oxford (1995)

2. A more detailed description of computational techniques is

A. R. Leach "Molecular Modelling Principles and Applications" Longman (1996)

3. A recent review is

L. M. Balbes, S. W. Mascarella and D. B. Boyd, in "Reviews in Computational

Chemistry, Vol. 5" K. B. Lipkowitz, D. B. Boyd, Eds., VCH, 337 (1994)

4. A. Glucksmann, Cell deaths in normal vertebrate ontogeny, Biol. Rev 26 (1951),

pp. 59–86.

5. An introduction to computational techniques is

G. H. Grant, W. G. Richards "Computational Chemistry" Oxford (1995)

6. An introduction to De Novo techniques is

S. Borman Chemical and Engineering News 70 (12), 18 (1992)

7. An introduction to structure-based techniques is

I. D. Kuntz, E. C. Meng, B. K. Shoichet Acct. Chem. Res. 27 (5), 117 (1994)

8. Ashkenazi and V.M. Dixit, Death receptors: signaling and modulation, Science

281 (1998), pp. 1305–1308. View Record in Scopus | Cited By in Scopus (3145)

9. B. Hogan, R. Beddington, F. Costantini and E. Lacy, Manipulating the Mouse

Embryo (Second Edition), Cold Spring Harbor Laboratory Press, Cold Spring

Harbor, NY (1994).

10. B. Kallen, Cell degeneration during normal ontogenesis of the rabbit brain, J.

Anat 89 (1955), pp. 153–161.

11. beta-site of APP-cleaving enzyme From Wikipedia, the free encyclopedia

12. clinical testing athttp://rarediseases.info.nih.gov/ord/ct-info-patient.html

and http://rarediseases.info.nih.gov/ord/ct-about.html

109

http://rarediseases.info.nih.gov/ord/ct-about.html

http://rarediseases.info.nih.gov/ord/ct-info-patient.html

http://www.sciencedirect.com/science?_ob=RedirectURL&_method=outwardLink&_partnerName=656&_targetURL=http%3A%2F%2Fwww.scopus.com%2Finward%2Fcitedby.url%3Feid%3D2-s2.0-0032575714%26partnerID%3D10%26rel%3DR3.0.0%26md5%3Dd37e8958b031d298448b7f4ec4ebc480&_acct=C000050221&_version=1&_userid=8321424&md5=6ca12f50a7701ff35a752aab04ec7b7f

http://www.sciencedirect.com/science?_ob=RedirectURL&_method=outwardLink&_partnerName=655&_targetURL=http%3A%2F%2Fwww.scopus.com%2Finward%2Frecord.url%3Feid%3D2-s2.0-0032575714%26partnerID%3D10%26rel%3DR3.0.0%26md5%3Dd37e8958b031d298448b7f4ec4ebc480&_acct=C000050221&_version=1&_userid=8321424&md5=8dd049b506a309a432eb9543b34dfca1

13. Cohen, N. Claude (1996). Guidebook on Molecular Modeling in Drug Design.

Boston: Academic Press. ISBN 012178245x.

14. Drug design From Wikipedia, the free encyclopedia

15. Guner, Osman F. (2000). Pharmacophore Perception, Development, and use in

Drug Design. La Jolla, Calif: International University Line. ISBN 0-9636817-6-1.

16. Leach, Andrew R.; Harren Jhoti (2007). Structure-based Drug Discovery. Berlin:

Springer. ISBN 1-4020-4406-2.

17. Madsen, Ulf; Krogsgaard-Larsen, Povl; Liljefors, Tommy (2002). Textbook of

Drug Design and Discovery. Washington, DC: Taylor & Francis. ISBN 0-415-

28288-8.

18. Schneider G, Fechner U (August 2005). "Computer-based de novo design of

drug-like molecules". Nat Rev Drug Discov 4 (8): 649–63. doi:10.1038/nrd1799.

PMID 16056391.

19. Wang R,Gao Y,Lai L (2000). "LigBuilder: A Multi-Purpose Program for

Structure-Based Drug Design". Journal of Molecular Modeling 6: 498–516.

doi:10.1007/s0089400060498.

110

http://dx.doi.org/10.1007%2Fs0089400060498

http://en.wikipedia.org/wiki/Digital_object_identifier

http://www.ncbi.nlm.nih.gov/pubmed/16056391

http://dx.doi.org/10.1038%2Fnrd1799

http://en.wikipedia.org/wiki/Digital_object_identifier

http://en.wikipedia.org/wiki/Special:BookSources/0415282888




http://en.wikipedia.org/wiki/Special:BookSources/012178245X

in-silico structure and analogue based studies on bace1 inhibitors for alzheimer’s disease

Documents

protein structure

structure of inhibitors

hydrogen bond interactions

hydrogen bond donor

hydrogen bond acceptor

bace1 site

field of structure

proper structure