CHARACTERIZATION AND PREDICTION OF DRUG BINDING SITES IN PROTEINS
Yariv Brosh & Alex Fardman
Advisor: Dr. Yanay Ofran
• Proteins – organic compounds that constitute the basic functional and computational unit in the cell. They are able to bind other molecules specifically and tightly.
• Pocket – The region of the protein responsible for binding.• Ligand – Substance that is able to bind to a biomolecule.• Drug – Substance that alters normal body function.
Background
• Most drugs achieve their effects by binding to a protein at a specific binding site and modifying its activity.
• One may want a drug that binds to a specific location in a protein to prevent side effects.
• Identifying those binding sites in proteins experimentally is time & resource consuming.
Background
Our goal
Find a way to predict whethera drug will bind to a protein
or not.
This will shorten the drug development time significantly…
• Collecting data – Pocket creation
• Choosing attributes & analysis of pockets accordingly
• Machine Learning
Methods
Attributes & pocket analysis•Count the number of each amino acid.
•Charge in physiological PH.
•Shape matching =
•Connectivity =
Accessibility calculationWith :
•Accessibility calculation is done by simulation of rolling water molecules over the protein surface.
Attributes & pocket analysisWith :
•Accessibility difference of protein atoms before binding the ligand and after.
•Accessibility difference of ligand atoms before binding to the pocket and after.
With HBPLUS:
•Number of hydrogen bonds between ligand and pocket.
Machine learningWith WEKA – using LibSVM:
Training:
Testing:
True: 200
False1,000-10,000
True: 85
False: 544-9,544
Data
Results
0 2000 4000 6000 8000 10000 120000
0.2
0.4
0.6
0.8
1
1.2
Precision/Recall change in the Positive Set
recallPrecision
Learning Set Size
Prec
ision
/Rec
all V
alue
Results
0 2000 4000 6000 8000 10000 120000.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1.02
Recall/Precision Change in the Negative Set
recall
precision
Learning Set Size
Reca
ll/Pr
ecisi
on V
alue
Results
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.0160
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5ROC Graph
TP
Loga-rithmic (TP)
False Positive Rate
True
Pos
itive
Rat
e
Conclusions•We were able to distinguish between real & non-biological binding sites without using computationally expensive energy functions or evolutionary conservation.•It is not possible to distinguish between binding sites with PatchDock alone.•Using the combination of simple and computationally “cheap” tools such as SVM, PatchDock and the algorithms for pocket analysis mentioned earlier, it is possible to give a good prediction regarding the nature of the binding site.•The advantage of the method is its simplicity: Taking the best docking conformations and comparing with characteristics of real and non-biological binding sites. (No need to compare entire proteins).
Conclusions•The few negative binding sites classified as positives may be potentially real binding sites. (Need to be checked experimentally).
The method can be improved and refined:
•More attributes•More drugs and proteins•Analysis of attribute significance•Bigger learning set•Bigger positive set in relation to the negative set in the learning set (help the learning algorithm)
Implications•The tool can be used to check possible side effects during drug development.
•Drug Repurposing - Find new targets for existing drugs.
•Can significantly shorten the drug toxicity check during development.