m u ltip le a ctive site c o rrectio n fo r d o ckin g an ...son051000/comp/vigersrizzi.pdfc learly,...

1
1) Introduction 2) Selection of test set 3) Docking Programs. 4) MASC-scoring 5) Docking results before correction 7) Results After MASC-scoring 6) What went wrong? Application to Virtual Screening. Discussion Multiple Active Site Correction for Docking and Virtual Screening Guy Vigers and James P. Rizzi Array BioPharma Inc, 3200 Walnut St., Boulder, CO 80304 A B C D A N Experimental structures of biologically-interesting macromolecules can be hard to obtain, but are essential for successful structure-based drug design (SBDD). One of the largest hurdles in SBDD is the "docking problem", where one attempts to predict the bound conformation of a ligand in an active site, and then to estimate the binding energy of the conformation. We have compared several docking programs, both for accuracy of docking and for the ranking of potential ligands. We have found that while several programs provide good reproduction of the bound conformation of a single ligand, no program currently available gives good ranking of multiple potential ligands. We present a simple correction, the Multiple Active Site Correction, which greatly improves the usefulness and significance of the docking scores. Fifteen co-crystal structures were selected from the PDB. All had been solved with resolution of ~2.0A and represent diverse proteins and ligands. The test set is shown below: Programs which we have tested include AutoDock, Dock, FlexX, Fred, Glide, Gold and QXP/Flo. FlexX, Gold and Fred all performed well, in that they all placed all 15 ligands in their cognate active sites within 2A RMSD of the experimental conformation. Program parameters used to generate the results shown below (Panels 5 -7) were as follows: FlexX: Default settings with the Chemscore scoring function. Gold: 3x speedup, 4/8 internal VdW potential, Gold scoring function. Fred: Default settings with the SS scoring function. For a ligand i docked across multiple active sites j one can calculate standard statistics as follows: μ i = j (S ij )/N j=1,N ( i ) 2 = j (S ij - μ i ) 2 /(N-1) j=1,N where μ i and i are the mean and standard deviation of the scores S ij for compound i across all active sites j. The docking scores can then be corrected as follows: S ij ' = (S ij - μ i )/ i where S ij ' is the modified score for compound i in active site j and S ij is the original score. We have termed this corrected score the Multiple-Active-Site-Correction score or MASC-score. All 15 ligands were docked into all 15 active sites, using FlexX, Gold and Fred. Active-sites run across the page and ligands run down the page. The best-scoring compound is highlighted for each active site. Correctly identified (cognate) ligands are colored green, while incorrect ones are colored red. None of the docking programs are able consistently to identify the correct ligand for the active sites. This problem is not because the ligands are docked incorrectly: All 15 ligands are docked into their cognate active sites within 2A of the experimental conformations (data not shown). However, current scoring functions appear to suffer from molecule-dependent biases: Some molecules as better than others, regardless of the active site involved. For instance, Gold scores the ligand1POC very well across many active sites. The other docking programs show different, but equally severe, biases. The molecule-dependent biases can be reduced using MASC-scoring, as described in Panel 4). Here, the data from Panel 5 are shown after MASC-scoring. The molecule-dependent biases appear to be greatly reduced, resulting in better identification of the cognate ligands for each active site. Similar improvements are seen in virtual screening/database enrichment experiments (see sidebar). Acknowledgements While the best docking programs can now dock a single ligand into an active site with a high probability of success, they perform poorly when ranking multiple potential ligands. This problem occurs, not because the docking programs fail to reproduce the bound conformation of the ligand, but because of ligand-dependent biases in the scoring functions. We have described a simple statistical correction, the MASC-score, which greatly reduces this problem. The MASC-score operates by calculating docking scores for each ligand against a series of control sites. We have shown that this correction works well with three different programs (FlexX, Gold and Fred) and multiple different scoring functions. The MASC-score can also be thought of as a measure of statistical significance, in units of Standard-Deviations about the Mean. It can therefore be used to assess the meaningfullness of a docking result, and possibly aid in the combination of multiple scoring techniques. Clearly, there is still much work to be done on scoring functions for docking and virtual screening, but we hope that the MASC-scoring method will aid in this work. As a test of MASC-scoring in virtual screening, 30 known PTP-1b ligands and 30 known p38 ligands were mixed with 600 decoy molecules from the MDDR, and docked into the active site of p38. The rate of retrieval of the known ligands is shown below: We thank Ken Brameld for the Fred experiments and the whole Computational Chemistry group at Array for stimulating discussions. abe_ site_1 aq_ site_1 azm_ site_1 cbx_ site_1 dfr_ site_1 dq_ site_1 epb_ site_1 glq_ site_1 hyt_ site_1 mrk_ site_1 phf_ site_1 phv_ site_1 poc_ site_1 srj_ site_1 tpp_ site_1 abe_conc_1 -24.47 -17.10 -13.16 -24.96 -14.35 -19.16 -10.62 -14.73 -20.97 -14.89 -16.62 -15.82 -13.38 -15.73 -11.89 aq_conc_1 -22.56 -41.39 -21.15 -22.43 -25.56 -16.05 -31.56 -36.81 -22.45 -18.91 -18.60 -24.41 -25.23 -24.78 -23.71 azm_conc_1 -20.90 -15.43 -30.26 -30.68 -20.96 -25.82 -18.09 -25.47 -38.12 -22.52 -20.71 -23.96 -26.55 -21.96 -20.09 cbx_conc_1 -19.55 -20.49 -31.84 -42.02 -27.86 -30.50 -20.12 -22.40 -42.28 -20.39 -26.30 -20.68 -23.48 -23.25 -16.95 dfr_conc_1 -22.37 -27.00 -23.55 -38.35 -32.05 -29.08 -29.52 -26.34 -43.46 -33.52 -18.86 -22.63 -26.28 -27.31 -22.37 dq_conc_1 -13.44 -18.94 -22.94 -33.28 -21.38 -34.02 -16.70 -19.43 -24.17 -23.16 -16.72 -19.07 -21.80 -16.46 -14.63 epb_conc_1 -24.97 -36.15 -36.86 -26.59 -25.61 -36.03 -46.75 -33.83 -36.61 -35.11 -30.22 -26.54 -35.98 -30.82 -25.12 glq_conc_1 -15.37 -21.88 -25.87 -27.30 -29.60 -26.48 -19.46 -28.99 -42.05 -25.97 -23.77 -27.52 -34.32 -25.99 -25.63 hyt_conc_1 -16.98 -19.45 -30.60 -37.78 -23.00 -28.03 -21.78 -23.96 -42.21 -18.54 -28.57 -22.46 -24.46 -20.90 -15.71 mrk_conc_1 -20.02 -17.48 -21.31 -23.89 -18.61 -22.69 -20.63 -20.10 -27.05 -22.03 -21.36 -18.75 -17.22 -17.64 -15.35 phf_conc_1 -16.45 -29.90 -25.79 -27.79 -21.86 -25.93 -23.14 -21.88 -28.96 -24.95 -28.44 -19.00 -19.95 -28.53 -16.38 phv_conc_1 -39.32 -46.67 -35.00 -46.08 -41.28 -22.82 -57.39 -35.55 -59.16 -46.48 -28.94 -56.95 -35.90 -37.90 -28.53 poc_conc_1 -21.75 -30.27 -36.25 -27.56 -33.45 -19.62 -22.88 -32.49 -40.68 -31.07 -25.14 -29.05 -39.60 -21.65 -28.74 srj_conc_1 -26.88 -29.22 -23.02 -31.98 -28.11 -26.05 -41.66 -29.04 -37.90 -33.95 -28.68 -29.83 -31.57 -42.63 -26.82 tpp_conc_1 -14.07 -23.13 -24.76 -25.54 -20.31 -24.89 -18.53 -21.34 -28.63 -22.32 -19.68 -25.35 -19.36 -25.17 -25.63 abe_ site_1 aq_ site_1 azm_ site_1 cbx_ site_1 dfr_ site_1 dq_ site_1 epb_ site_1 glq_ site_1 hyt_ site_1 mrk_ site_1 phf_ site_1 phv_ site_1 poc_ site_1 srj_ site_1 tpp_ site_1 abe_conc_1 3.08 2.02 2.34 2.27 1.92 2.42 1.95 1.93 2.30 2.27 2.32 2.19 2.45 2.30 1.98 aq_conc_1 0.00 6.16 2.17 2.52 3.01 3.15 0.72 2.72 2.51 4.19 0.01 2.97 3.12 3.05 2.87 azm_conc_1 1.56 2.32 2.51 2.64 2.34 2.95 2.38 2.32 2.54 2.59 2.72 2.29 2.71 2.56 2.26 cbx_conc_1 2.56 2.99 2.73 5.08 3.14 3.67 2.88 2.74 3.16 3.50 3.46 2.76 3.22 3.21 2.53 dfr_conc_1 0.01 4.33 3.67 5.19 5.35 4.15 4.02 4.17 5.83 4.58 0.46 3.98 3.80 4.85 3.66 dq_conc_1 2.28 2.59 2.92 4.26 2.82 4.35 2.47 2.52 2.53 3.20 3.42 2.33 2.89 3.13 2.60 epb_conc_1 0.09 3.34 2.60 2.75 3.31 3.48 3.78 2.73 2.58 2.67 0.20 2.84 2.82 3.13 2.98 glq_conc_1 0.02 4.36 4.55 4.49 5.47 5.26 4.51 4.35 4.85 5.13 1.74 5.13 6.32 5.65 4.04 hyt_conc_1 2.68 2.99 2.89 4.08 2.97 3.64 2.86 2.73 3.36 3.37 3.17 2.70 3.56 3.15 2.65 mrk_conc_1 0.73 2.44 2.42 2.42 2.64 2.90 2.53 2.58 2.52 3.21 2.29 2.69 2.89 3.09 2.29 phf_conc_1 1.76 2.16 1.88 1.96 2.11 2.32 2.02 1.99 1.97 2.21 2.46 1.96 2.08 2.16 2.26 phv_conc_1 0.00 5.48 3.61 3.40 5.26 5.13 5.24 4.09 5.48 4.24 0.00 8.30 5.46 4.69 5.50 poc_conc_1 0.00 4.67 3.96 6.41 5.34 3.52 4.79 5.23 4.40 5.81 0.14 4.70 8.78 6.26 4.02 srj_conc_1 0.08 3.31 2.63 3.97 3.32 2.60 3.49 2.64 2.70 3.45 1.18 2.89 3.11 4.95 2.80 tpp_conc_1 0.94 2.81 2.71 4.10 2.82 3.65 2.63 2.91 2.60 3.26 3.05 3.16 3.14 3.01 2.54 abe_1 aq_1 azm_1 cbx_1 dfr_1 dq_1 epb_1 glq_1 hyt_1 mrk_1 phf_1 phv_1 srj_1 tpp_1 abe_1 -21.34 -5.12 -14.94 -23.15 -9.88 -13.40 -1.76 -5.80 -19.45 -11.70 -8.13 -10.16 -18.84 -7.15 aq_1 -32.03 -3.63 -16.63 -17.36 -6.14 -18.70 -11.93 -27.36 -19.42 -17.14 -8.23 azm_1 -2.35 -7.95 -21.22 -34.35 -7.07 -15.11 -8.05 -11.87 -29.50 -12.85 -16.04 -10.24 -10.10 -9.71 cbx_1 -17.03 -13.23 -19.17 -40.41 -15.96 -25.49 -12.98 -18.02 -29.18 -19.31 -21.24 -16.56 -28.82 -20.72 dfr_1 -11.71 -9.68 -9.94 -11.95 -1.69 -8.83 -10.85 -14.01 -4.71 dq_1 -12.26 -17.03 -33.56 -10.80 -22.12 -13.37 -14.69 -32.03 -19.36 -20.17 -11.36 -14.52 -12.67 epb_1 -24.01 -21.93 -26.76 -21.45 -31.22 -26.49 -23.62 -21.62 -24.16 -23.34 -32.04 -20.19 glq_1 -13.59 -11.14 -5.04 -5.64 -19.83 -3.50 -7.34 -15.54 -16.04 -7.87 hyt_1 -22.61 -11.31 -14.59 -35.99 -15.42 -20.69 -15.87 -18.42 -31.21 -23.41 -19.62 -21.03 -23.79 -20.15 mrk_1 -6.70 -7.71 -16.38 -8.85 -9.30 -6.24 -5.90 -13.55 -14.70 -7.49 -9.25 -10.71 -11.35 phf_1 -17.30 -16.02 -17.07 -19.53 -14.95 -22.50 -16.67 -16.22 -16.56 -20.64 -21.20 -17.28 -20.66 -16.07 phv_1 -20.02 -18.31 -16.16 -8.05 -19.58 -10.67 srj_1 -16.61 -20.76 -24.58 -19.33 -19.76 -17.75 -19.19 -23.54 -26.25 -6.21 -17.32 -32.20 -18.96 tpp_1 -12.85 -14.82 -25.32 -13.87 -24.55 -10.62 -7.53 -26.74 -17.29 -8.87 -10.72 -21.79 -27.09 abe_ site_1 aq_ site_1 azm_ site_1 cbx_ site_1 dfr_ site_1 dq_ site_1 epb_ site_1 glq_ site_1 hyt_ site_1 mrk_ site_1 phf_ site_1 phv_ site_1 poc_ site_1 srj_ site_1 tpp_ site_1 abe_conc_1 -1.88 -0.14 0.79 -1.99 0.51 -0.62 1.40 0.42 -1.05 0.39 -0.02 0.17 0.74 0.19 1.10 aq_conc_1 0.37 -2.41 0.57 0.38 -0.08 1.32 -0.96 -1.73 0.38 0.90 0.95 0.09 -0.03 0.04 0.20 azm_conc_1 0.56 1.52 -1.08 -1.15 0.55 -0.30 1.05 -0.24 -2.46 0.28 0.60 0.02 -0.43 0.38 0.70 cbx_conc_1 0.81 0.69 -0.76 -2.07 -0.25 -0.59 0.74 0.44 -2.10 0.70 -0.05 0.66 0.31 0.34 1.14 dfr_conc_1 0.89 0.18 0.71 -1.56 -0.59 -0.14 -0.21 0.28 -2.34 -0.82 1.43 0.85 0.29 0.13 0.89 dq_conc_1 1.27 0.36 -0.31 -2.03 -0.05 -2.16 0.73 0.27 -0.52 -0.35 0.73 0.33 -0.12 0.77 1.07 epb_conc_1 1.23 -0.60 -0.72 0.96 1.12 -0.58 -2.34 -0.22 -0.68 -0.43 0.37 0.97 -0.57 0.27 1.20 glq_conc_1 1.85 0.78 0.13 -0.10 -0.48 0.03 1.18 -0.38 -2.51 0.12 0.48 -0.14 -1.25 0.11 0.17 hyt_conc_1 1.07 0.74 -0.76 -1.72 0.26 -0.41 0.43 0.13 -2.32 0.86 -0.48 0.34 0.07 0.55 1.24 mrk_conc_1 0.09 0.94 -0.35 -1.22 0.56 -0.81 -0.12 0.06 -2.28 -0.59 -0.36 0.51 1.03 0.89 1.66 phf_conc_1 1.65 -1.32 -0.41 -0.85 0.46 -0.44 0.17 0.45 -1.11 -0.22 -0.99 1.09 0.88 -1.01 1.66 phv_conc_1 0.17 -0.50 0.57 -0.45 -0.01 1.68 -1.48 0.52 -1.64 -0.48 1.12 -1.44 0.48 0.30 1.16 poc_conc_1 1.17 -0.14 -1.07 0.28 -0.63 1.50 1.00 -0.49 -1.75 -0.27 0.65 0.05 -1.58 1.19 0.09 srj_conc_1 0.75 0.34 1.43 -0.15 0.54 0.90 -1.85 0.37 -1.19 -0.49 0.44 0.23 -0.07 -2.02 0.76 tpp_conc_1 2.29 -0.15 -0.59 -0.80 0.61 -0.62 1.09 0.33 -1.63 0.07 0.78 -0.74 0.87 -0.70 -0.82 abe_ site_1 aq_ site_1 azm_ site_1 cbx_ site_1 dfr_ site_1 dq_ site_1 epb_ site_1 glq_ site_1 hyt_ site_1 mrk_ site_1 phf_ site_1 phv_ site_1 poc_ site_1 srj_ site_1 tpp_ site_1 abe_conc_1 -1.88 -0.14 0.79 -1.99 0.51 -0.62 1.40 0.42 -1.05 0.39 -0.02 0.17 0.74 0.19 1.10 aq_conc_1 0.37 -2.41 0.57 0.38 -0.08 1.32 -0.96 -1.73 0.38 0.90 0.95 0.09 -0.03 0.04 0.20 azm_conc_1 0.56 1.52 -1.08 -1.15 0.55 -0.30 1.05 -0.24 -2.46 0.28 0.60 0.02 -0.43 0.38 0.70 cbx_conc_1 0.81 0.69 -0.76 -2.07 -0.25 -0.59 0.74 0.44 -2.10 0.70 -0.05 0.66 0.31 0.34 1.14 dfr_conc_1 0.89 0.18 0.71 -1.56 -0.59 -0.14 -0.21 0.28 -2.34 -0.82 1.43 0.85 0.29 0.13 0.89 dq_conc_1 1.27 0.36 -0.31 -2.03 -0.05 -2.16 0.73 0.27 -0.52 -0.35 0.73 0.33 -0.12 0.77 1.07 epb_conc_1 1.23 -0.60 -0.72 0.96 1.12 -0.58 -2.34 -0.22 -0.68 -0.43 0.37 0.97 -0.57 0.27 1.20 glq_conc_1 1.85 0.78 0.13 -0.10 -0.48 0.03 1.18 -0.38 -2.51 0.12 0.48 -0.14 -1.25 0.11 0.17 hyt_conc_1 1.07 0.74 -0.76 -1.72 0.26 -0.41 0.43 0.13 -2.32 0.86 -0.48 0.34 0.07 0.55 1.24 mrk_conc_1 0.09 0.94 -0.35 -1.22 0.56 -0.81 -0.12 0.06 -2.28 -0.59 -0.36 0.51 1.03 0.89 1.66 phf_conc_1 1.65 -1.32 -0.41 -0.85 0.46 -0.44 0.17 0.45 -1.11 -0.22 -0.99 1.09 0.88 -1.01 1.66 phv_conc_1 0.17 -0.50 0.57 -0.45 -0.01 1.68 -1.48 0.52 -1.64 -0.48 1.12 -1.44 0.48 0.30 1.16 poc_conc_1 1.17 -0.14 -1.07 0.28 -0.63 1.50 1.00 -0.49 -1.75 -0.27 0.65 0.05 -1.58 1.19 0.09 srj_conc_1 0.75 0.34 1.43 -0.15 0.54 0.90 -1.85 0.37 -1.19 -0.49 0.44 0.23 -0.07 -2.02 0.76 tpp_conc_1 2.29 -0.15 -0.59 -0.80 0.61 -0.62 1.09 0.33 -1.63 0.07 0.78 -0.74 0.87 -0.70 -0.82 abe_ site_1 aq_ site_1 azm_ site_1 cbx_ site_1 dfr_ site_1 dq_ site_1 epb_ site_1 glq_ site_1 hyt_ site_1 mrk_ site_1 phf_ site_1 phv_ site_1 poc_ site_1 srj_ site_1 tpp_ site_1 abe_conc_1 2.82 -0.78 0.30 0.07 -1.13 0.59 -1.02 -1.09 0.19 0.08 0.24 -0.21 0.68 0.18 -0.91 aq_conc_1 -1.69 2.29 -0.29 -0.06 0.26 0.35 -1.22 0.07 -0.07 1.02 -1.68 0.23 0.33 0.28 0.17 azm_conc_1 -2.82 -0.41 0.20 0.61 -0.35 1.60 -0.21 -0.40 0.30 0.46 0.88 -0.49 0.86 0.37 -0.60 cbx_conc_1 -0.98 -0.30 -0.70 3.03 -0.06 0.78 -0.47 -0.69 -0.02 0.51 0.45 -0.66 0.07 0.06 -1.02 dfr_conc_1 -2.40 0.29 -0.12 0.82 0.92 0.17 0.09 0.19 1.22 0.44 -2.12 0.07 -0.04 0.61 -0.13 dq_conc_1 -1.06 -0.57 -0.06 2.05 -0.22 2.19 -0.76 -0.68 -0.67 0.39 0.73 -0.97 -0.10 0.27 -0.55 epb_conc_1 -2.38 0.68 -0.02 0.13 0.64 0.81 1.09 0.10 -0.04 0.05 -2.27 0.21 0.19 0.48 0.34 glq_conc_1 -2.77 -0.02 0.10 0.06 0.68 0.55 0.08 -0.03 0.29 0.47 -1.68 0.47 1.22 0.80 -0.22 hyt_conc_1 -1.06 -0.31 -0.56 2.31 -0.37 1.25 -0.63 -0.94 0.58 0.61 0.12 -1.01 1.06 0.07 -1.13 mrk_conc_1 -3.15 -0.11 -0.16 -0.16 0.23 0.68 0.04 0.13 0.02 1.25 -0.39 0.32 0.68 1.03 -0.39 phf_conc_1 -1.83 0.38 -1.12 -0.69 0.15 1.30 -0.34 -0.55 -0.66 0.67 2.05 -0.71 -0.01 0.43 0.95 phv_conc_1 -2.08 0.51 -0.37 -0.47 0.41 0.35 0.40 -0.14 0.51 -0.07 -2.08 1.85 0.50 0.14 0.53 poc_conc_1 -2.04 0.06 -0.26 0.84 0.36 -0.46 0.11 0.31 -0.06 0.58 -1.98 0.07 1.91 0.78 -0.23 srj_conc_1 -2.50 0.39 -0.22 0.98 0.40 -0.24 0.55 -0.21 -0.16 0.51 -1.51 0.02 0.21 1.85 -0.07 tpp_conc_1 -2.87 -0.11 -0.26 1.78 -0.10 1.12 -0.39 0.04 -0.42 0.55 0.24 0.40 0.37 0.17 -0.52 abe_ site_1 aq_ site_1 azm_ site_1 cbx_ site_1 dfr_ site_1 dq_ site_1 epb_ site_1 glq_ site_1 hyt_ site_1 mrk_ site_1 phf_ site_1 phv_ site_1 poc_ site_1 srj_ site_1 tpp_ site_1 abe_conc_1 2.82 -0.78 0.30 0.07 -1.13 0.59 -1.02 -1.09 0.19 0.08 0.24 -0.21 0.68 0.18 -0.91 aq_conc_1 -1.69 2.29 -0.29 -0.06 0.26 0.35 -1.22 0.07 -0.07 1.02 -1.68 0.23 0.33 0.28 0.17 azm_conc_1 -2.82 -0.41 0.20 0.61 -0.35 1.60 -0.21 -0.40 0.30 0.46 0.88 -0.49 0.86 0.37 -0.60 cbx_conc_1 -0.98 -0.30 -0.70 3.03 -0.06 0.78 -0.47 -0.69 -0.02 0.51 0.45 -0.66 0.07 0.06 -1.02 dfr_conc_1 -2.40 0.29 -0.12 0.82 0.92 0.17 0.09 0.19 1.22 0.44 -2.12 0.07 -0.04 0.61 -0.13 dq_conc_1 -1.06 -0.57 -0.06 2.05 -0.22 2.19 -0.76 -0.68 -0.67 0.39 0.73 -0.97 -0.10 0.27 -0.55 epb_conc_1 -2.38 0.68 -0.02 0.13 0.64 0.81 1.09 0.10 -0.04 0.05 -2.27 0.21 0.19 0.48 0.34 glq_conc_1 -2.77 -0.02 0.10 0.06 0.68 0.55 0.08 -0.03 0.29 0.47 -1.68 0.47 1.22 0.80 -0.22 hyt_conc_1 -1.06 -0.31 -0.56 2.31 -0.37 1.25 -0.63 -0.94 0.58 0.61 0.12 -1.01 1.06 0.07 -1.13 mrk_conc_1 -3.15 -0.11 -0.16 -0.16 0.23 0.68 0.04 0.13 0.02 1.25 -0.39 0.32 0.68 1.03 -0.39 phf_conc_1 -1.83 0.38 -1.12 -0.69 0.15 1.30 -0.34 -0.55 -0.66 0.67 2.05 -0.71 -0.01 0.43 0.95 phv_conc_1 -2.08 0.51 -0.37 -0.47 0.41 0.35 0.40 -0.14 0.51 -0.07 -2.08 1.85 0.50 0.14 0.53 poc_conc_1 -2.04 0.06 -0.26 0.84 0.36 -0.46 0.11 0.31 -0.06 0.58 -1.98 0.07 1.91 0.78 -0.23 srj_conc_1 -2.50 0.39 -0.22 0.98 0.40 -0.24 0.55 -0.21 -0.16 0.51 -1.51 0.02 0.21 1.85 -0.07 tpp_conc_1 -2.87 -0.11 -0.26 1.78 -0.10 1.12 -0.39 0.04 -0.42 0.55 0.24 0.40 0.37 0.17 -0.52 abe_1 aq_1 azm_1 cbx_1 dfr_1 dq_1 epb_1 glq_1 hyt_1 mrk_1 phf_1 phv_1 srj_1 tpp_1 abe_1 -1.39 1.08 -0.42 -1.67 0.35 -0.18 1.59 0.97 -1.10 0.08 0.62 0.31 -1.01 0.77 aq_1 -1.84 1.47 -0.05 -0.13 1.18 -0.29 0.50 -1.30 -0.37 -0.11 0.93 azm_1 1.32 0.69 -0.81 -2.29 0.79 -0.12 0.67 0.24 -1.75 0.13 -0.23 0.43 0.44 0.49 cbx_1 0.57 1.08 0.28 -2.56 0.71 -0.56 1.11 0.44 -1.06 0.27 0.01 0.63 -1.01 0.08 dfr_1 -0.64 -0.11 -0.18 -0.70 1.98 0.11 -0.41 -1.24 1.19 dq_1 0.77 0.13 -2.09 0.97 -0.55 0.62 0.44 -1.89 -0.18 -0.29 0.89 0.47 0.72 epb_1 0.19 0.75 -0.54 0.87 -1.72 -0.47 0.30 0.83 0.15 0.37 -1.94 1.21 glq_1 -0.55 -0.11 1.00 0.89 -1.69 1.28 0.58 -0.91 -1.00 0.49 hyt_1 -0.25 1.50 0.99 -2.31 0.86 0.05 0.79 0.40 -1.57 -0.37 0.21 0.00 -0.43 0.13 mrk_1 0.95 0.64 -1.96 0.30 0.17 1.09 1.19 -1.11 -1.45 0.71 0.18 -0.26 -0.45 phf_1 0.32 0.86 0.41 -0.63 1.31 -1.88 0.58 0.77 0.63 -1.10 -1.33 0.33 -1.11 0.84 phv_1 -0.91 -0.57 -0.14 1.49 -0.83 0.96 srj_1 0.59 -0.09 -0.73 0.14 0.07 0.40 0.17 -0.56 -1.00 2.31 0.48 -1.99 0.20 tpp_1 0.59 0.32 -1.15 0.45 -1.04 0.90 1.34 -1.35 -0.03 1.15 0.89 -0.66 -1.40 Ligand 1poc docked into protein 1poc. The experimental conformation is shown in purple. Ligand 1poc docked into protein 1cbx. The experimental conformation of the true 1cbx ligand is shown in purple. Ligand 1poc docked into protein 1glq. The experimental conformation of the true 1glq ligand is shown in purple. Ligand 1poc docked into protein 1srj. The experimental conformation of the true 1srj ligand is shown in purple. Database Enrichment, raw scores 0 5 10 15 20 25 30 0 100 200 300 400 500 600 Total Number Number of inhibitor ptp_inh p38_inh Uniform Database Enrichment, MASC-scores 0 5 10 15 20 25 30 0 100 200 300 400 500 600 Total Number Number of inhibitors ptp_inh p38_inh Uniform Database Enrichment, kinase controls 0 5 10 15 20 25 30 0 100 200 300 400 500 600 Total Number Number of inhibitor ptp_inh p38_inh Uniform After MASC-scoring: p38 ligands are now retrieved first. Raw Gold scores: The PTP-1b ligands are retrieved first, even though we are screening against the p38 active site! MASC-scores corrected against 15 kinases, rather than against 15 diverse active sites. While the p38 ligands are still retrieved first, the discrimination has actually been reduced. This is because molecules which score well against p38 may also score well against other kinases, giving a higher value of μ i and reducing the MASC- score. Therefore, a standard set of diverse active sites gives better MASC-scoring than a set of active sites similar to the target of interest. FlexX results. 3/15 endogenous ligands are correctly identified (more negative scores are better). Gold results. 4/15 endogenous ligands are correctly identified (more positive scores are better). Fred results. 5/15 endogenous ligands are correctly identified. FlexX results. 11/15 endogenous ligands are correctly identified. Gold results. 11/15 endogenous ligands are correctly identified. FlexX results. 11/15 endogenous ligands are correctly identified. Some of this work has been published in J. Med. Chem. 2004 47(1) 80-89. The MASC-scoring can be incorporated into a relational database and used in virtual screening or in de-novo structure-based drug design. We have implemented such a system in MySQL, as follows: # Information about the protein active site # identifier, name, center in x, y and z create table site_table ( site_id INT UNSIGNED NOT NULL auto_increment , site_name varchar(50) NOT NULL , origin varchar(50) , radius real , PRIMARY KEY (site_id) ); # Information about a molecule create table mol_table ( mol_id INT UNSIGNED NOT NULL , sln varchar(255) , block INT UNSIGNED , est_mean real , est_sd real , PRIMARY KEY (mol_id) ); # Information about a molecule's score # Split into control and target score_tables create table target_score_table ( mol_id INT UNSIGNED NOT NULL references mol_table , site_id INT UNSIGNED NOT NULL references site_table , score real, est_chi real, chi real, INDEX (mol_id), INDEX (site_id) ); create table control_score_table ( mol_id INT UNSIGNED NOT NULL references mol_table , site_id INT UNSIGNED NOT NULL references site_table , score real, INDEX (mol_id), INDEX (site_id) ); OH O HO HO OH O O N H N N NH O H N N N S H 2 N O O S O HO HO O O O P O N O O OH OH OH O NH 2 OH O N H HN OH O O S N O O O O HO O OH NH 2 NH N O HO HO OH N N N H N O O P OH O O P OH O H 2 N O O HO N N HO NH 2 HO HO O NH NH 2 N N N NH HO O HO O O N NH 2 N HO HN HO O NH HO O 1ABE, Arabinose-binding protein 1.7A 1AQ1, CDK,2 2.0A 1AZM, Carbonic anhydrase, 2.0A 1CBX, Carboxypeptidase, 2.0A 1D1Q, Yeast Low Molecular Weight Protein- Tyrosine Phosphatase, 1.7A 1EBP, Retinoic acid binding protein, 2.2A 1GLQ, Glutathione-S-transferase, 1.8A 1HYT, Thermolysin, 1.7A 1MRK, Ribosome inactivating protein, 1.6A 1PHF, Cytochrome p450-cam, 1.6A 1POC, Phospholipase A2, 2.0A 1SRJ, Streptavidin, 1.8A 1TPP, Trypsin, 1.4A 4DFR, DHFR, 1.7A 4PHV, HIV-protease, 2.1A

Upload: others

Post on 14-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M u ltip le A ctive Site C o rrectio n fo r D o ckin g an ...son051000/comp/VigersRizzi.pdfC learly, th ere is still m u ch w o rk to b e d o n e o n sco rin g fu n ctio n s fo r d

1) Introduction 2) Selection of test set 3) Docking Programs. 4) MASC-scoring

5) Docking results before correction

7) Results After MASC-scoring

6) What went wrong?

Application to Virtual Screening.

Discussion

Multiple Active Site Correction for Docking and Virtual ScreeningGuy Vigers and James P. Rizzi

Array BioPharma Inc, 3200 Walnut St., Boulder, CO 80304

AA

B C D

AN

Experimental structures of biologically-interesting macromolecules can be hard to obtain, but are essential for successful structure-based drug design (SBDD). One of the largest hurdles in SBDD is the "docking problem", where one attempts to predict the bound conformation of a ligand in an active site, and then to estimate the binding energy of the conformation.

We have compared several docking programs, both for accuracy of docking and for the ranking of potential ligands. We have found that while several programs provide good reproduction of the bound conformation of a single ligand, no program currently available gives good ranking of multiple potential ligands. We present a simple correction, the Multiple Active Site Correction, which greatly improves the usefulness and significance of the docking scores.

Fifteen co-crystal structures were selected from the PDB. All had been solved with resolution of~2.0A and represent diverse proteins and ligands. The test set is shown below:

Programs which we have tested include AutoDock, Dock, FlexX, Fred, Glide, Gold and QXP/Flo. FlexX, Gold and Fred all performed well, in that they all placed all 15 ligands in their cognate active sites within 2A RMSD of the experimental conformation.

Program parameters used to generate the results shown below (Panels 5 -7) were as follows:

FlexX: Default settings with the Chemscore scoring function.

Gold: 3x speedup, 4/8 internal VdW potential, Gold scoring function.

Fred: Default settings with the SS scoring function.

For a ligand i docked across multiple active sites j one can calculate standard statisticsas follows:µi = j(Sij)/N j=1,N(i)

2 = j(Sij - µi)2/(N-1) j=1,N

where µi and i are the mean and standard deviation of the scores Sij for compound i across all active sites j.

The docking scores can then be corrected as follows:Sij' = (Sij - µi)/i where Sij' is the modified score for compound i in active site j and Sij is the original score.

We have termed this corrected score the Multiple-Active-Site-Correction score or MASC-score.

All 15 ligands were docked into all 15 active sites, using FlexX, Gold and Fred. Active-sites run across the page and ligands run down the page. The best-scoring compound is highlighted for each active site. Correctly identified (cognate) ligands are colored green, while incorrect ones are colored red.

None of the docking programs are able consistently to identify the correct ligandfor the active sites.

This problem is not because the ligands are docked incorrectly: All 15 ligands are docked into their cognate active sites within 2A of the experimental conformations (data not shown).However, current scoring functions appear to suffer from molecule-dependent biases: Some molecules as better than others, regardless of the active site involved. For instance, Gold scores the ligand1POC very well across many active sites. The other docking programs show different, but equally severe, biases.

The molecule-dependent biases can be reduced using MASC-scoring, as described in Panel 4). Here, the data from Panel 5 are shown after MASC-scoring. The molecule-dependent biases appear to be greatly reduced, resulting in better identification of the cognate ligands for each active site. Similar improvements are seen in virtual screening/database enrichment experiments (see sidebar).

Acknowledgements While the best docking programs can now dock a single ligand into an active site with a high probability of success, they perform poorly when ranking multiple potential ligands. This problem occurs, not because the docking programs fail to reproduce the bound conformation of the ligand, but because of ligand-dependent biases in the scoring functions. We have described a simple statistical correction, the MASC-score, which greatly reduces this problem. The MASC-score operates by calculating docking scores for each ligand against a series of control sites. We have shown that this correction works well with three different programs (FlexX, Gold and Fred) and multiple different scoring functions.

The MASC-score can also be thought of as a measure of statistical significance, in units of Standard-Deviations about the Mean. It can therefore be used to assess the meaningfullness of a docking result, and possibly aid in the combination of multiple scoring techniques.

Clearly, there is still much work to be done on scoring functions for docking and virtual screening, but we hope that the MASC-scoring method will aid in this work.

As a test of MASC-scoring in virtual screening, 30 known PTP-1b ligands and 30 known p38 ligands were mixed with 600 decoy molecules from the MDDR, and docked into the active site of p38. The rate of retrieval of the known ligands is shown below:

We thank Ken Brameld for the Fred experiments and the whole Computational Chemistry group at Array for stimulating discussions.

abe_site_1

aq_site_1

azm_site_1

cbx_site_1

dfr_site_1

dq_site_1

epb_site_1

glq_site_1

hyt_site_1

mrk_site_1

phf_site_1

phv_site_1

poc_site_1

srj_site_1

tpp_site_1

abe_conc_1 -24.47 -17.10 -13.16 -24.96 -14.35 -19.16 -10.62 -14.73 -20.97 -14.89 -16.62 -15.82 -13.38 -15.73 -11.89aq_conc_1 -22.56 -41.39 -21.15 -22.43 -25.56 -16.05 -31.56 -36.81 -22.45 -18.91 -18.60 -24.41 -25.23 -24.78 -23.71azm_conc_1 -20.90 -15.43 -30.26 -30.68 -20.96 -25.82 -18.09 -25.47 -38.12 -22.52 -20.71 -23.96 -26.55 -21.96 -20.09cbx_conc_1 -19.55 -20.49 -31.84 -42.02 -27.86 -30.50 -20.12 -22.40 -42.28 -20.39 -26.30 -20.68 -23.48 -23.25 -16.95dfr_conc_1 -22.37 -27.00 -23.55 -38.35 -32.05 -29.08 -29.52 -26.34 -43.46 -33.52 -18.86 -22.63 -26.28 -27.31 -22.37dq_conc_1 -13.44 -18.94 -22.94 -33.28 -21.38 -34.02 -16.70 -19.43 -24.17 -23.16 -16.72 -19.07 -21.80 -16.46 -14.63epb_conc_1 -24.97 -36.15 -36.86 -26.59 -25.61 -36.03 -46.75 -33.83 -36.61 -35.11 -30.22 -26.54 -35.98 -30.82 -25.12glq_conc_1 -15.37 -21.88 -25.87 -27.30 -29.60 -26.48 -19.46 -28.99 -42.05 -25.97 -23.77 -27.52 -34.32 -25.99 -25.63hyt_conc_1 -16.98 -19.45 -30.60 -37.78 -23.00 -28.03 -21.78 -23.96 -42.21 -18.54 -28.57 -22.46 -24.46 -20.90 -15.71mrk_conc_1 -20.02 -17.48 -21.31 -23.89 -18.61 -22.69 -20.63 -20.10 -27.05 -22.03 -21.36 -18.75 -17.22 -17.64 -15.35phf_conc_1 -16.45 -29.90 -25.79 -27.79 -21.86 -25.93 -23.14 -21.88 -28.96 -24.95 -28.44 -19.00 -19.95 -28.53 -16.38phv_conc_1 -39.32 -46.67 -35.00 -46.08 -41.28 -22.82 -57.39 -35.55 -59.16 -46.48 -28.94 -56.95 -35.90 -37.90 -28.53poc_conc_1 -21.75 -30.27 -36.25 -27.56 -33.45 -19.62 -22.88 -32.49 -40.68 -31.07 -25.14 -29.05 -39.60 -21.65 -28.74srj_conc_1 -26.88 -29.22 -23.02 -31.98 -28.11 -26.05 -41.66 -29.04 -37.90 -33.95 -28.68 -29.83 -31.57 -42.63 -26.82tpp_conc_1 -14.07 -23.13 -24.76 -25.54 -20.31 -24.89 -18.53 -21.34 -28.63 -22.32 -19.68 -25.35 -19.36 -25.17 -25.63

abe_site_1

aq_site_1

azm_site_1

cbx_site_1

dfr_site_1

dq_site_1

epb_site_1

glq_site_1

hyt_site_1

mrk_site_1

phf_site_1

phv_site_1

poc_site_1

srj_site_1

tpp_site_1

abe_conc_1 3.08 2.02 2.34 2.27 1.92 2.42 1.95 1.93 2.30 2.27 2.32 2.19 2.45 2.30 1.98aq_conc_1 0.00 6.16 2.17 2.52 3.01 3.15 0.72 2.72 2.51 4.19 0.01 2.97 3.12 3.05 2.87azm_conc_1 1.56 2.32 2.51 2.64 2.34 2.95 2.38 2.32 2.54 2.59 2.72 2.29 2.71 2.56 2.26cbx_conc_1 2.56 2.99 2.73 5.08 3.14 3.67 2.88 2.74 3.16 3.50 3.46 2.76 3.22 3.21 2.53dfr_conc_1 0.01 4.33 3.67 5.19 5.35 4.15 4.02 4.17 5.83 4.58 0.46 3.98 3.80 4.85 3.66dq_conc_1 2.28 2.59 2.92 4.26 2.82 4.35 2.47 2.52 2.53 3.20 3.42 2.33 2.89 3.13 2.60epb_conc_1 0.09 3.34 2.60 2.75 3.31 3.48 3.78 2.73 2.58 2.67 0.20 2.84 2.82 3.13 2.98glq_conc_1 0.02 4.36 4.55 4.49 5.47 5.26 4.51 4.35 4.85 5.13 1.74 5.13 6.32 5.65 4.04hyt_conc_1 2.68 2.99 2.89 4.08 2.97 3.64 2.86 2.73 3.36 3.37 3.17 2.70 3.56 3.15 2.65mrk_conc_1 0.73 2.44 2.42 2.42 2.64 2.90 2.53 2.58 2.52 3.21 2.29 2.69 2.89 3.09 2.29phf_conc_1 1.76 2.16 1.88 1.96 2.11 2.32 2.02 1.99 1.97 2.21 2.46 1.96 2.08 2.16 2.26phv_conc_1 0.00 5.48 3.61 3.40 5.26 5.13 5.24 4.09 5.48 4.24 0.00 8.30 5.46 4.69 5.50poc_conc_1 0.00 4.67 3.96 6.41 5.34 3.52 4.79 5.23 4.40 5.81 0.14 4.70 8.78 6.26 4.02srj_conc_1 0.08 3.31 2.63 3.97 3.32 2.60 3.49 2.64 2.70 3.45 1.18 2.89 3.11 4.95 2.80tpp_conc_1 0.94 2.81 2.71 4.10 2.82 3.65 2.63 2.91 2.60 3.26 3.05 3.16 3.14 3.01 2.54

abe_1 aq_1 azm_1 cbx_1 dfr_1 dq_1 epb_1 glq_1 hyt_1 mrk_1 phf_1 phv_1 srj_1 tpp_1abe_1 -21.34 -5.12 -14.94 -23.15 -9.88 -13.40 -1.76 -5.80 -19.45 -11.70 -8.13 -10.16 -18.84 -7.15aq_1 -32.03 -3.63 -16.63 -17.36 -6.14 -18.70 -11.93 -27.36 -19.42 -17.14 -8.23azm_1 -2.35 -7.95 -21.22 -34.35 -7.07 -15.11 -8.05 -11.87 -29.50 -12.85 -16.04 -10.24 -10.10 -9.71cbx_1 -17.03 -13.23 -19.17 -40.41 -15.96 -25.49 -12.98 -18.02 -29.18 -19.31 -21.24 -16.56 -28.82 -20.72dfr_1 -11.71 -9.68 -9.94 -11.95 -1.69 -8.83 -10.85 -14.01 -4.71dq_1 -12.26 -17.03 -33.56 -10.80 -22.12 -13.37 -14.69 -32.03 -19.36 -20.17 -11.36 -14.52 -12.67epb_1 -24.01 -21.93 -26.76 -21.45 -31.22 -26.49 -23.62 -21.62 -24.16 -23.34 -32.04 -20.19glq_1 -13.59 -11.14 -5.04 -5.64 -19.83 -3.50 -7.34 -15.54 -16.04 -7.87hyt_1 -22.61 -11.31 -14.59 -35.99 -15.42 -20.69 -15.87 -18.42 -31.21 -23.41 -19.62 -21.03 -23.79 -20.15mrk_1 -6.70 -7.71 -16.38 -8.85 -9.30 -6.24 -5.90 -13.55 -14.70 -7.49 -9.25 -10.71 -11.35phf_1 -17.30 -16.02 -17.07 -19.53 -14.95 -22.50 -16.67 -16.22 -16.56 -20.64 -21.20 -17.28 -20.66 -16.07phv_1 -20.02 -18.31 -16.16 -8.05 -19.58 -10.67srj_1 -16.61 -20.76 -24.58 -19.33 -19.76 -17.75 -19.19 -23.54 -26.25 -6.21 -17.32 -32.20 -18.96tpp_1 -12.85 -14.82 -25.32 -13.87 -24.55 -10.62 -7.53 -26.74 -17.29 -8.87 -10.72 -21.79 -27.09

abe_site_1

aq_site_1

azm_site_1

cbx_site_1

dfr_site_1

dq_site_1

epb_site_1

glq_site_1

hyt_site_1

mrk_site_1

phf_site_1

phv_site_1

poc_site_1

srj_site_1

tpp_site_1

abe_conc_1 -1.88 -0.14 0.79 -1.99 0.51 -0.62 1.40 0.42 -1.05 0.39 -0.02 0.17 0.74 0.19 1.10aq_conc_1 0.37 -2.41 0.57 0.38 -0.08 1.32 -0.96 -1.73 0.38 0.90 0.95 0.09 -0.03 0.04 0.20azm_conc_1 0.56 1.52 -1.08 -1.15 0.55 -0.30 1.05 -0.24 -2.46 0.28 0.60 0.02 -0.43 0.38 0.70cbx_conc_1 0.81 0.69 -0.76 -2.07 -0.25 -0.59 0.74 0.44 -2.10 0.70 -0.05 0.66 0.31 0.34 1.14dfr_conc_1 0.89 0.18 0.71 -1.56 -0.59 -0.14 -0.21 0.28 -2.34 -0.82 1.43 0.85 0.29 0.13 0.89dq_conc_1 1.27 0.36 -0.31 -2.03 -0.05 -2.16 0.73 0.27 -0.52 -0.35 0.73 0.33 -0.12 0.77 1.07epb_conc_1 1.23 -0.60 -0.72 0.96 1.12 -0.58 -2.34 -0.22 -0.68 -0.43 0.37 0.97 -0.57 0.27 1.20glq_conc_1 1.85 0.78 0.13 -0.10 -0.48 0.03 1.18 -0.38 -2.51 0.12 0.48 -0.14 -1.25 0.11 0.17hyt_conc_1 1.07 0.74 -0.76 -1.72 0.26 -0.41 0.43 0.13 -2.32 0.86 -0.48 0.34 0.07 0.55 1.24mrk_conc_1 0.09 0.94 -0.35 -1.22 0.56 -0.81 -0.12 0.06 -2.28 -0.59 -0.36 0.51 1.03 0.89 1.66phf_conc_1 1.65 -1.32 -0.41 -0.85 0.46 -0.44 0.17 0.45 -1.11 -0.22 -0.99 1.09 0.88 -1.01 1.66phv_conc_1 0.17 -0.50 0.57 -0.45 -0.01 1.68 -1.48 0.52 -1.64 -0.48 1.12 -1.44 0.48 0.30 1.16poc_conc_1 1.17 -0.14 -1.07 0.28 -0.63 1.50 1.00 -0.49 -1.75 -0.27 0.65 0.05 -1.58 1.19 0.09srj_conc_1 0.75 0.34 1.43 -0.15 0.54 0.90 -1.85 0.37 -1.19 -0.49 0.44 0.23 -0.07 -2.02 0.76tpp_conc_1 2.29 -0.15 -0.59 -0.80 0.61 -0.62 1.09 0.33 -1.63 0.07 0.78 -0.74 0.87 -0.70 -0.82

abe_site_1

aq_site_1

azm_site_1

cbx_site_1

dfr_site_1

dq_site_1

epb_site_1

glq_site_1

hyt_site_1

mrk_site_1

phf_site_1

phv_site_1

poc_site_1

srj_site_1

tpp_site_1

abe_conc_1 -1.88 -0.14 0.79 -1.99 0.51 -0.62 1.40 0.42 -1.05 0.39 -0.02 0.17 0.74 0.19 1.10aq_conc_1 0.37 -2.41 0.57 0.38 -0.08 1.32 -0.96 -1.73 0.38 0.90 0.95 0.09 -0.03 0.04 0.20azm_conc_1 0.56 1.52 -1.08 -1.15 0.55 -0.30 1.05 -0.24 -2.46 0.28 0.60 0.02 -0.43 0.38 0.70cbx_conc_1 0.81 0.69 -0.76 -2.07 -0.25 -0.59 0.74 0.44 -2.10 0.70 -0.05 0.66 0.31 0.34 1.14dfr_conc_1 0.89 0.18 0.71 -1.56 -0.59 -0.14 -0.21 0.28 -2.34 -0.82 1.43 0.85 0.29 0.13 0.89dq_conc_1 1.27 0.36 -0.31 -2.03 -0.05 -2.16 0.73 0.27 -0.52 -0.35 0.73 0.33 -0.12 0.77 1.07epb_conc_1 1.23 -0.60 -0.72 0.96 1.12 -0.58 -2.34 -0.22 -0.68 -0.43 0.37 0.97 -0.57 0.27 1.20glq_conc_1 1.85 0.78 0.13 -0.10 -0.48 0.03 1.18 -0.38 -2.51 0.12 0.48 -0.14 -1.25 0.11 0.17hyt_conc_1 1.07 0.74 -0.76 -1.72 0.26 -0.41 0.43 0.13 -2.32 0.86 -0.48 0.34 0.07 0.55 1.24mrk_conc_1 0.09 0.94 -0.35 -1.22 0.56 -0.81 -0.12 0.06 -2.28 -0.59 -0.36 0.51 1.03 0.89 1.66phf_conc_1 1.65 -1.32 -0.41 -0.85 0.46 -0.44 0.17 0.45 -1.11 -0.22 -0.99 1.09 0.88 -1.01 1.66phv_conc_1 0.17 -0.50 0.57 -0.45 -0.01 1.68 -1.48 0.52 -1.64 -0.48 1.12 -1.44 0.48 0.30 1.16poc_conc_1 1.17 -0.14 -1.07 0.28 -0.63 1.50 1.00 -0.49 -1.75 -0.27 0.65 0.05 -1.58 1.19 0.09srj_conc_1 0.75 0.34 1.43 -0.15 0.54 0.90 -1.85 0.37 -1.19 -0.49 0.44 0.23 -0.07 -2.02 0.76tpp_conc_1 2.29 -0.15 -0.59 -0.80 0.61 -0.62 1.09 0.33 -1.63 0.07 0.78 -0.74 0.87 -0.70 -0.82

abe_site_1

aq_site_1

azm_site_1

cbx_site_1

dfr_site_1

dq_site_1

epb_site_1

glq_site_1

hyt_site_1

mrk_site_1

phf_site_1

phv_site_1

poc_site_1

srj_site_1

tpp_site_1

abe_conc_1 2.82 -0.78 0.30 0.07 -1.13 0.59 -1.02 -1.09 0.19 0.08 0.24 -0.21 0.68 0.18 -0.91aq_conc_1 -1.69 2.29 -0.29 -0.06 0.26 0.35 -1.22 0.07 -0.07 1.02 -1.68 0.23 0.33 0.28 0.17azm_conc_1 -2.82 -0.41 0.20 0.61 -0.35 1.60 -0.21 -0.40 0.30 0.46 0.88 -0.49 0.86 0.37 -0.60cbx_conc_1 -0.98 -0.30 -0.70 3.03 -0.06 0.78 -0.47 -0.69 -0.02 0.51 0.45 -0.66 0.07 0.06 -1.02dfr_conc_1 -2.40 0.29 -0.12 0.82 0.92 0.17 0.09 0.19 1.22 0.44 -2.12 0.07 -0.04 0.61 -0.13dq_conc_1 -1.06 -0.57 -0.06 2.05 -0.22 2.19 -0.76 -0.68 -0.67 0.39 0.73 -0.97 -0.10 0.27 -0.55epb_conc_1 -2.38 0.68 -0.02 0.13 0.64 0.81 1.09 0.10 -0.04 0.05 -2.27 0.21 0.19 0.48 0.34glq_conc_1 -2.77 -0.02 0.10 0.06 0.68 0.55 0.08 -0.03 0.29 0.47 -1.68 0.47 1.22 0.80 -0.22hyt_conc_1 -1.06 -0.31 -0.56 2.31 -0.37 1.25 -0.63 -0.94 0.58 0.61 0.12 -1.01 1.06 0.07 -1.13mrk_conc_1 -3.15 -0.11 -0.16 -0.16 0.23 0.68 0.04 0.13 0.02 1.25 -0.39 0.32 0.68 1.03 -0.39phf_conc_1 -1.83 0.38 -1.12 -0.69 0.15 1.30 -0.34 -0.55 -0.66 0.67 2.05 -0.71 -0.01 0.43 0.95phv_conc_1 -2.08 0.51 -0.37 -0.47 0.41 0.35 0.40 -0.14 0.51 -0.07 -2.08 1.85 0.50 0.14 0.53poc_conc_1 -2.04 0.06 -0.26 0.84 0.36 -0.46 0.11 0.31 -0.06 0.58 -1.98 0.07 1.91 0.78 -0.23srj_conc_1 -2.50 0.39 -0.22 0.98 0.40 -0.24 0.55 -0.21 -0.16 0.51 -1.51 0.02 0.21 1.85 -0.07tpp_conc_1 -2.87 -0.11 -0.26 1.78 -0.10 1.12 -0.39 0.04 -0.42 0.55 0.24 0.40 0.37 0.17 -0.52

abe_site_1

aq_site_1

azm_site_1

cbx_site_1

dfr_site_1

dq_site_1

epb_site_1

glq_site_1

hyt_site_1

mrk_site_1

phf_site_1

phv_site_1

poc_site_1

srj_site_1

tpp_site_1

abe_conc_1 2.82 -0.78 0.30 0.07 -1.13 0.59 -1.02 -1.09 0.19 0.08 0.24 -0.21 0.68 0.18 -0.91aq_conc_1 -1.69 2.29 -0.29 -0.06 0.26 0.35 -1.22 0.07 -0.07 1.02 -1.68 0.23 0.33 0.28 0.17azm_conc_1 -2.82 -0.41 0.20 0.61 -0.35 1.60 -0.21 -0.40 0.30 0.46 0.88 -0.49 0.86 0.37 -0.60cbx_conc_1 -0.98 -0.30 -0.70 3.03 -0.06 0.78 -0.47 -0.69 -0.02 0.51 0.45 -0.66 0.07 0.06 -1.02dfr_conc_1 -2.40 0.29 -0.12 0.82 0.92 0.17 0.09 0.19 1.22 0.44 -2.12 0.07 -0.04 0.61 -0.13dq_conc_1 -1.06 -0.57 -0.06 2.05 -0.22 2.19 -0.76 -0.68 -0.67 0.39 0.73 -0.97 -0.10 0.27 -0.55epb_conc_1 -2.38 0.68 -0.02 0.13 0.64 0.81 1.09 0.10 -0.04 0.05 -2.27 0.21 0.19 0.48 0.34glq_conc_1 -2.77 -0.02 0.10 0.06 0.68 0.55 0.08 -0.03 0.29 0.47 -1.68 0.47 1.22 0.80 -0.22hyt_conc_1 -1.06 -0.31 -0.56 2.31 -0.37 1.25 -0.63 -0.94 0.58 0.61 0.12 -1.01 1.06 0.07 -1.13mrk_conc_1 -3.15 -0.11 -0.16 -0.16 0.23 0.68 0.04 0.13 0.02 1.25 -0.39 0.32 0.68 1.03 -0.39phf_conc_1 -1.83 0.38 -1.12 -0.69 0.15 1.30 -0.34 -0.55 -0.66 0.67 2.05 -0.71 -0.01 0.43 0.95phv_conc_1 -2.08 0.51 -0.37 -0.47 0.41 0.35 0.40 -0.14 0.51 -0.07 -2.08 1.85 0.50 0.14 0.53poc_conc_1 -2.04 0.06 -0.26 0.84 0.36 -0.46 0.11 0.31 -0.06 0.58 -1.98 0.07 1.91 0.78 -0.23srj_conc_1 -2.50 0.39 -0.22 0.98 0.40 -0.24 0.55 -0.21 -0.16 0.51 -1.51 0.02 0.21 1.85 -0.07tpp_conc_1 -2.87 -0.11 -0.26 1.78 -0.10 1.12 -0.39 0.04 -0.42 0.55 0.24 0.40 0.37 0.17 -0.52

abe_1 aq_1 azm_1 cbx_1 dfr_1 dq_1 epb_1 glq_1 hyt_1 mrk_1 phf_1 phv_1 srj_1 tpp_1abe_1 -1.39 1.08 -0.42 -1.67 0.35 -0.18 1.59 0.97 -1.10 0.08 0.62 0.31 -1.01 0.77aq_1 -1.84 1.47 -0.05 -0.13 1.18 -0.29 0.50 -1.30 -0.37 -0.11 0.93azm_1 1.32 0.69 -0.81 -2.29 0.79 -0.12 0.67 0.24 -1.75 0.13 -0.23 0.43 0.44 0.49cbx_1 0.57 1.08 0.28 -2.56 0.71 -0.56 1.11 0.44 -1.06 0.27 0.01 0.63 -1.01 0.08dfr_1 -0.64 -0.11 -0.18 -0.70 1.98 0.11 -0.41 -1.24 1.19dq_1 0.77 0.13 -2.09 0.97 -0.55 0.62 0.44 -1.89 -0.18 -0.29 0.89 0.47 0.72epb_1 0.19 0.75 -0.54 0.87 -1.72 -0.47 0.30 0.83 0.15 0.37 -1.94 1.21glq_1 -0.55 -0.11 1.00 0.89 -1.69 1.28 0.58 -0.91 -1.00 0.49hyt_1 -0.25 1.50 0.99 -2.31 0.86 0.05 0.79 0.40 -1.57 -0.37 0.21 0.00 -0.43 0.13mrk_1 0.95 0.64 -1.96 0.30 0.17 1.09 1.19 -1.11 -1.45 0.71 0.18 -0.26 -0.45phf_1 0.32 0.86 0.41 -0.63 1.31 -1.88 0.58 0.77 0.63 -1.10 -1.33 0.33 -1.11 0.84phv_1 -0.91 -0.57 -0.14 1.49 -0.83 0.96srj_1 0.59 -0.09 -0.73 0.14 0.07 0.40 0.17 -0.56 -1.00 2.31 0.48 -1.99 0.20tpp_1 0.59 0.32 -1.15 0.45 -1.04 0.90 1.34 -1.35 -0.03 1.15 0.89 -0.66 -1.40

Ligand 1poc docked into protein 1poc. The experimental conformation is shown in purple.

Ligand 1poc docked into protein 1cbx. The experimental conformation of the true 1cbx ligand is shown in purple.

Ligand 1poc docked into protein 1glq. The experimental conformation of the true 1glq ligand is shown in purple.

Ligand 1poc docked into protein 1srj. The experimental conformation of the true 1srj ligand is shown in purple.

Database Enrichment, raw scores

05

1015202530

0 100 200 300 400 500 600

Total Number

Num

ber

of in

hibi

tors

ptp_inhp38_inhUniform

Database Enrichment, MASC-scores

0

5

10

15

20

25

30

0 100 200 300 400 500 600

Total Number

Num

ber o

f inh

ibito

rs

ptp_inhp38_inhUniform

Database Enrichment, kinase controls

05

1015202530

0 100 200 300 400 500 600

Total Number

Num

ber

of in

hibi

tors

ptp_inhp38_inhUniform

After MASC-scoring: p38 ligands are now retrieved first.

Raw Gold scores: The PTP-1b ligands are retrieved first, even though we are screening against the p38 active site!

MASC-scores corrected against 15 kinases, rather thanagainst 15 diverse active sites. While the p38 ligands are still retrieved first, the discrimination has actually beenreduced. This is because molecules which score well against p38 may also score well against other kinases,giving a higher value of µi and reducing the MASC-score. Therefore, a standard set of diverse active sites gives better MASC-scoring than a set of active sitessimilar to the target of interest.

FlexX results. 3/15 endogenous ligands are correctlyidentified (more negative scores are better).

Gold results. 4/15 endogenous ligands are correctlyidentified (more positive scores are better).

Fred results. 5/15 endogenous ligands are correctlyidentified.

FlexX results. 11/15 endogenous ligands are correctlyidentified.

Gold results. 11/15 endogenous ligands are correctlyidentified.

FlexX results. 11/15 endogenous ligands are correctlyidentified.

Some of this work has been published in J. Med. Chem. 2004 47(1) 80-89.

The MASC-scoring can be incorporated into a relational database and used in virtual screening or in de-novo structure-based drug design. We have implemented such a system inMySQL, as follows:

# Information about the protein active site# identifier, name, center in x, y and zcreate table site_table ( site_id INT UNSIGNED NOT NULL auto_increment , site_name varchar(50) NOT NULL ,origin varchar(50) ,radius real ,PRIMARY KEY (site_id) ); # Information about a moleculecreate table mol_table ( mol_id INT UNSIGNED NOT NULL ,sln varchar(255) ,block INT UNSIGNED ,est_mean real ,est_sd real ,PRIMARY KEY (mol_id) );

# Information about a molecule's score# Split into control and target score_tables create table target_score_table ( mol_id INT UNSIGNED NOT NULL references mol_table ,site_id INT UNSIGNED NOT NULL references site_table , score real,est_chi real,chi real,INDEX (mol_id),INDEX (site_id) ); create table control_score_table ( mol_id INT UNSIGNED NOT NULL references mol_table ,site_id INT UNSIGNED NOT NULL references site_table , score real,INDEX (mol_id),INDEX (site_id) );

OHO

HO

HO OH

O

ONH

N

N

NH

O

HN

NNSH2NO

O SO

HO

HO

O

O

OP

ONO

OOH

OH

OH

O

NH2

OH

O

NH

HNOH

O

O

S

NO

O

O

O

HO

O

OH

NH2

NHNOHO

HO OH N N

N

HN

OO

POHO

OPOH

O

H2N

O

OHO

NNHO

NH2

HO

HO ONH

NH2

N

N

N

NH

HO O

HO

OO

NNH2

N

HOHN

HOONHHO

O

1ABE, Arabinose-binding protein 1.7A

1AQ1, CDK,2 2.0A

1AZM, Carbonic anhydrase, 2.0A

1CBX, Carboxypeptidase, 2.0A

1D1Q, Yeast Low Molecular Weight Protein-Tyrosine Phosphatase, 1.7A

1EBP, Retinoic acid binding protein, 2.2A

1GLQ, Glutathione-S-transferase, 1.8A

1HYT, Thermolysin, 1.7A

1MRK, Ribosome inactivating protein, 1.6A

1PHF, Cytochrome p450-cam, 1.6A

1POC, Phospholipase A2, 2.0A

1SRJ, Streptavidin, 1.8A

1TPP, Trypsin, 1.4A

4DFR, DHFR, 1.7A

4PHV, HIV-protease, 2.1A