a systems approach to prioritizing the synthesis of new … · 2014. 12. 2. · compounds,...
TRANSCRIPT
-
Steven Muskal, Ph.D.Chief Technology OfficerSertanty, Inc.
A Systems Approach to Prioritizing the Synthesis A Systems Approach to Prioritizing the Synthesis
of New Compositions of Matterof New Compositions of Matter
-
Discovery results from efficient collaboration
Chemistry Biology
Informatics(modeling)
What can be made
What should be made
-
A systems approach requires efficient “cause-effect” feedback
That which is synthetically “feasible:”• In-house expertise• Starting material availability• Cost/time
That which is medicinally relevant, efficacious, and safe:• Potency• Selectivity• Bioavailability• Low toxicity• Cost effective
What can be made?
What should be made?
-
ChIP – Chemical Intelligence Platform
• Forward, prospective exploration of existing and newly coupled synthetic strategies
• “Mixing-n-matching” synthetic protocols to generate novel, synthetically accessible molecules
• Pre- and post-synthesis filtering and prioritization
• Synthesis strategies guided towards novel and medicinally relevant molecules using ePotency,eSelectivity, and eADMEtox models
“What can you make?” “What should you make?”
-
“What can you make?”
-
Working towards “what can you make?”
Step 1: Defining chemical transformations mathematically
C = T (BBA,BBB)O
R1 H
HN
R2H2N R2+
A B C
R1
Building blocks BBA, BBB , and BBD with reaction specific “introspective” filters define the scope / applicability of the transformation
E = T2(C, BBD) = T2(T1(BBA, BBB), BBD)
HN
R2
C
+O
HO R3
D
NR2
O R3
E
R1R1
Later incorporating reaction conditions (e.g. reagents, catalysts, solvents, temp., conc., time...)
-
4
Subset 1: Nucleophilic Aromatic Substitution Reactions
ID Depiction comments / reactants
7000 NN
N
Cl
R
R
NH
R R
N
N
N
N
R
R
R
R+
aliphatic primary or secondar amine
7001 NN
N
Cl
R
R
NH2RN
N
N
NH
R
R
R
+
aromatic primary amine
7002 NNCl R
R
R NH
R RN
NN
R
RR
R
R
+
aliphatic primary and secondary amine
7003 NNCl R
R
R NH2R N
NNH
RR
R
R
+
aromatic primary amine
7004 N
N
Cl R
R
RNH
R RN
N
N
R
RR
R
R
+
2-chloropyrimidine, but not 4-chloropyrimidine aliphatic primary or secondary amine
7005 N
N
Cl R
R
R NH2R
N
N
NH
RR
R
R+
2-chloropyrimidine, but not 4-chloropyrimidine primary aromatic amine
7006 NCl
R
R R
R
NH
R RNN
R
R
R
R
R
R+
aliphatic primary or secondary amine
7007 NCl
R
R R
R
NH2R
NNH
R
R
R
R
R
+
aromatic primary amine
7008 [F,Cl]
N+
OO
RR
R
R
NH
RR N
N+
OO
R
R
RR
R
R
+
aliphatic primary or secondary amine
7009 [F,Cl]
N+
OO
RR
R
R
NH2R N
H
N+
OO
RR
R
R
R
+
aromatic primary amine
7010 [F,Cl]
RR
R R
NO2 NH
RRN
R
R
RR
R R
NO2+
amine aliphatic primary or secondary
-
7011 [F,Cl]
RR
R R
NO2 NH2R N
H
R
R
R
R R
NO2+
aromatic primary amine
7012 [F,Cl]
RR
R
NO2
NO2
NH
RRN
R
R
RR
R
NO2
NO2
+
amine aliphatic primary secondary
7013 [F,Cl]
RR
R
NO2
NO2
NH2R N
H
RR
R
R
NO2
NO2
+
aromatic primary amine
Subset 2: Pd-Catalyzed Aromatic Substitutions
ID Depiction comments / reactants
7014 R [Br,I] NHR
R
NR
R
R+
aryl bromide / iodide; aliphatic primary or secondary amine
7015 R [Br,I] NHR
R
NR
R
R+
aryl bromide / iodide; aromatic primary or secondary amine
7018 R [Br,I] BOHOH
RR
R+
aryl bromide / iodide; aryl boronic acid
Subset 3: Functional Group Transformations
ID Depiction comments / reactants
7036 R
N+
O
O
RNH2
aryl nitro
7044 NH NH
O
O R
R
N N
Cl
Cl R
R
as drawn
7045 [C,N] NHO
[C,N]R R
[C,N] N
[C,N]
Cl
RR as drawn
-
6
Subset 4: Amine Acylation Reactions (Amides / Carbamates)
ID Depiction comments / reactants
7046 R OH
ONH
RRN
O
R
R
R+
carboxylic acids; aliphatic amines; alcohols and aromatic amines allowed, allows activated aryl halides, no alkylators
7047 R OH
ONH
RRN
O
R
R
R+
carboxylic acids; aromatic amines; not compatible with alcohols and related, no alkylators, allows activated aryl halides
7048 R Cl
ONH
RRN
O
R
R
R+
carboxylic acid chlorides, chloroformates, chloroformamidines; aliphatic amines; not compatible with alcohols or aromatic amines, no alkylayors, allows activated aryl halides
7049 R Cl
ONH
RRN
O
R
R
R+
carboxylic acid chlorides, chloroformates, chloroformamidines; aromatic amines; not compatible with alcohols and related, no alkylators, allows activated aryl halides
Subset 5: Amine Acylation Reactions (Ureas / Thioureas)
ID Depiction comments / reactants
7062 N
OR N
H
RRNH
N
O
R R
R
+
aliphatic amines; aromatic amines alcohols allowed; activated aryl halides allowed; not compatible with alkylators, other nucleophiles, other acylators, acidic compounds
7063 N
OR N
H
RRNH
N
O
R R
R
+
aromatic amines; alcohols allowed; activated aryl halides allowed; not compatible with alkytators, nucleophiles, other acylators, acidic compounds
7064 N
SR NH3
NH
NH2
S
R+
ammonia precursors for thioimidazoles; aryl amines and alcohols allowed, no alkylators, other acylators, nucleophiles, acids
7065 NS
R NH2R
NH
NH
S
R R+
primary aryl amine; precursor for 2-amino-benzthioimidazole or 2-amino-benzimidazole, or related aryls; secondary aryl amines and alcohols allowed, no alkylators, other acylators, no acids, no nucleophiles
7072 N
Sfmoc NH
R
RNH
N
S
R
R
fmoc+
aryl primary or secondary amines; Precursor for thioimidazoles with FmocNCS
7073 N
Sfmoc NH
R
RNH
N
S
R
R
fmoc+
aliphatic primary or secondary amines; Precursor for thioimidazoles with FmocNCS; allows alcohols and aromatic amines
-
Subset 6: Formation of Diverse Heterocyclic Systems
ID Depiction comments / reactants
7037 NH
O
NH2
[C,N][C,N]
[C,N][C,N]
R OH
O
R[C,N]
[C,N][C,N]
[C,N]N
O
N R
R+
any 2-amino arylamide not in a ring; any carboxylic acid; excludes acylators, alkylators, nucleophiles, other acids, activated aryl halides, aryl bromides/iodides
7038 NH
O
NH2
[C,N][C,N]
[C,N][C,N]
R Cl
O
R[C,N]
[C,N][C,N]
[C,N]N
O
N R
R+
any 2-amino arylamide not in a ring; carboxylic acid chloride, no chloroformate, carbamoyl chloride, etc.; excludes other acylators, alkylators, nucleophiles, acids, activated aryl halides, aryl bromides/iodides
7039 NH
O
NH2
[C,N][C,N]
[C,N][C,N]
R O
O
R[C,N]
[C,N][C,N]
[C,N]N
O
N R
R+
any 2-amino arylamide not in a ring; carboxylic acid methyl ester, does not allow any other ester, which is too restrictive; no differentiation between different ester reactivity; excludes acylators, alkylators, nucleophiles, acids, activated aryl halides, aryl bromides/iodides
7040 NHR
O [C,N][C,N]
[C,N]
[C,N]N
OOR
R
[C,N][C,N]
[C,N]
[C,N]
NN
O
O
R
any 2-ureido aryl methyl carboxylate not in a ring; excludes, acylators, alkylators, acids, nucleophiles, activated aryl halides, ary bromides/iodides
7041 NH
R
O [C,N][C,N]
[C,N]
[C,N]N
OOHR
R
[C,N][C,N]
[C,N]
[C,N]
NN
O
O
R
any 2-ureido aryl carboxylic acid not in a ring; excludes, acylators, alkylators, acids, nucleophiles, activated aryl halides, ary bromides/iodides
7066 [C,N]
[C,N]
[C,N][C,N]OH
NH
R
[O,S] [C,N][C,N]
[C,N][C,N]
O
NR
any 2-hydroxyaryl-(NH)-amide, thioamide, urea, thiourea, etc.; excludes, acylators, alkylators, acids, nucleophiles, activated aryl halides, ary bromides/iodides
7067 [C,N]
[C,N]
[C,N][C,N]SH
NH
R
[O,S] [C,N][C,N]
[C,N][C,N]
S
NR
any 2-mercaptoaryl-(NH)-amide, thioamide, urea, thiourea, etc.; excludes, acylators, alkylators, acids, nucleophiles, activated aryl halides, ary bromides/iodides
7068 [C,N]
[C,N]
[C,N][C,N]NH
NH
R
[O,S]
R
[C,N][C,N]
[C,N][C,N]
N
NR
R
any 2-aminoaryl-(NH)-amide, thioamide, urea, thiourea, etc.; not amido aryl amide; excludes, acylators, alkylators, acids, nucleophiles, activated aryl halides, aryl bromide/iodide
7069 [C,N]
[C,N]
[C,N][C,N]NH
NH2
R
R O
O [C,N][C,N]
[C,N][C,N]
N
NR
R
+
1,2-diamino aryl (amino-2-arylamine); carboxylic acid methyl ester, does not allow any other ester, which is too restrictive; no differentiation between different ester reactivity; exclude, nucleophiles, acids, acylators, alkylators, activated aryl halides, aryl bromide/iodide, aldehydes, other esters
7070 [C,N]
[C,N]
[C,N][C,N]NH
NH2
R
R
O [C,N][C,N]
[C,N][C,N]
N
NR
R
+
1,2-diamino aryl; aldehyde; exclude, nucleophiles, acids, acylators, alkylators, activated aryl halides, aryl bromide/iodide
-
8
7071 [C,N]
[C,N][C,N]
[C,N] SH
NH2R
O [C,N]
[C,N][C,N]
[C,N] S
NR+
2-mercapto arylamine; aldehyde; exclude, nucleophiles, acids, acylators, alkylators, activated aryl halides, aryl bromide/iodide, aldehydes
Subset 7: Formation of Thioimidazoles
ID Depiction comments / reactants
7060 R NH2
S
R
O[Br,I]
RN
S R
R
R
+
thioamide unsubstituted; not compatible with ketones or aldehydes, alkyl bromides / iodides, sulfonyl esters, etc., acidic compounds, nucleophilic compounds
7061 NRR
NH2
S
R
O[Br,I]
R N
S R
R
N
R
R+
unsubstituted thiourea; not compatible with ketones or aldehydes, alkyl bromides / iodides, sulfonyl esters, etc., acidic compounds, nucleophilic compounds
Subset 8: Standard Deprotection Steps
ID Depiction comments / reactants
7079 N
R
R
O
O
NHR
R
Fmoc deprotection, any Fmoc amine or amide aliphatic or aromatic
7080 NRR
O
O
NHR
R
any boc amine or amide aliphatic or aromatic
7081 RO
ROH
t-Bu ether, no acid or carbamate
7082 R O
O
R OH
O
t-Bu ester, not carbamate, etc, R must be carbon
7083 [O,S]
R
R
R
R [O,S]R
any trityl ether/ester, thioether/ester derivative
7084 N
R
R
R
R
R
NH
R R
any trityl amine / amide aromatic/aliphatic derivative
-
Filtering, filtering, filtering…
RXN Smirk with “introspective filter:”
[N;$([N;!H0](c1[#6,#7][#6,#7][#6,#7][#6,#7]c1[N;!H0;!H1])[A,a]);!$([N+]);!$(NC=,#[!#6]);!$(NC=,#[#6]);!$(N[!#6;!#1]):4]([c:11]1[c:10]([N;!H0;!H1:3]([H])[H])[a:7][a:5][a:6][a:8]1)([A,a:1])[H].[CH3]O[C;$(C(O[CH3])(=O)[A,a]);!$(C(=O)(O[CH3])[!#6]):9](=O)[A,a:2]>>[n:3]1[c:9]([n:4]([c:11]2[c:10]1[a:7][a:5][a:6][a:8]2)[A,a:1])[A,a:2]
[C,N][C,N]
[C,N][C,N]NH
NH2
R
R O
O [C,N][C,N]
[C,N][C,N]
N
NR
R
+r7069
Required: 1,2-diaminoaryl (amino-2-arylamine); carboxylic acid methylesterNot allowed: nucleophiles, acids, acylators, alkylators, activated arylhalides, aryl bromide/iodide, aldehydes, any other esters (no differentiation between reactivities of different esters)
e.g. Bulding Block Required Filters (r7069.1, r7069.2):
c(c([NH2;v3])[a])([a])[N;!H0;!$([N+]);!$(NC=,#[!#6]);!$(NC=,#[#6]);!$(N[!#6])]
[C;$(C(=O)O[CH3]);!$(C(=O)(O[CH3])[!#6])]
e.g. Bulding Block Exclude Filters (r7069.1):
( [N;!H0;$(NC);!$([N+]);!$(NC=,#[!#6]);!$(NC=,#[#6]);!$(N[!#6]);!$(Nc)] ) OR ( [N;!H0;$(N[N;$(N[#6]);!$(NC=,#[!#6])]);!$(NC=,#[!#6])] ) OR ( C([NH2])=[NH] ) OR ( [S;$([SH]),$([S-])] )
( [S;$(S(=O)[OH])] ) OR ( [C;$(C(=O)[OH]);!$(C(=O)([OH])[!#6])] )
( [C;$(C(=[O,N,S])[O,S]C(=[O,N,S]))] ) OR ( [C;$(C(=[O,N,S])[F,Cl,Br])] ) OR ( [C;$(C(=[O,S])=N)] ) OR ( [S;$(S(=O)[Cl,Br,F,I])] )
( [C;$(C[Br,I]);!$(C=,#[A])] ) OR ( [C;$(COS(=O)(=O))] )
( [c;$(c1([Cl,Br,F,I])nc[n,c][c,n][c,n]1)] ) OR ( [c;$(c1([F,Cl])c([N+](=O)[O-])cccc1),$(c1([F,Cl])ccc([N+](=O)[O-])cc1)] )
[c;$(c[Br,I])]
[C;$(C(=O)O);!$(C(=O)(O)[!#6]);!$(C(=O)O[!#6]);!$(C(=O)([OH]));!$(C(=O)([O-]));!$(C(=O)OC=,#[!#6])]
[C;!H0;$(C=O);!$(C(=O)[!#6]);!$(C(=O)=[A])]
-
Mix-n-match concept
step1
Protocol 1 Protocol 2 Protocol 3
step2
step3A step3B
step 4
step1
step2step2A step2C
step1
step 3
step 4
step 3
step 5A step 5B
new sequence new product type
products 1 products 2
products 3
step 4
step2B
-
Very powerful Java-based, XML drivencommand-line-enumerator - “CLE”
r7069.smk
smi
r7069.smi
r7069.1.smi
r7069.2.smi
…
r7003.smk
smi
r7069r7003.smi
r7003.1.smi
r7003.2.smi
filters/required/r7003.1
filters/exclude/r7003.1
1
1
-
“What should you make?”
-
2239
237
895
2082
6709
2083
1018
360
2424
176
2424
175
2424
177
8066
2469
6603
1838
2711
2450
2428
2429
2447
1438
2424
729
6536 824
238
258
877
868
2239
268
6534 67
4499 58
1307
7208
7210
8570 66
7198
7172
1373
346
7839
4439
34 255
2517
2423
995
1739
608
4791
6737
1306 733
2511
2512 975
4788
2268
2262
8080
2270
2263
2272
2424
731
8674
4792 470
6576
2239
194
3222
4772
6609
4783
6581
P KAP KCP R KCAP R KCA_DAGP R KCB1P R KCB1_DAGP R KCDP R KCE_DAGP R KCG_DAGP R KCH_DAGCDC2CDK1_BCDK2CDK2_ACDK2_ECDK4_D1CDK4_D2CDK5CDK5_p25CDK5_p35GSK3AGSK3BMAP K14ABLCSKGrb2_SH2FYNLCKSRCSYKZAP _70_SH2RAF1MAP 2KADKP IK4CAP P KP DKEGFREGFR_SubstrateERBB2FGFRTEKFLT1KDRP DGFRP DGFRB
Sertanty KB moleculeID
Observed Potency and Selectivity
8-9
7-8
6-7
5-6
4-5
3-4
Reported activities of top-“kinase active” molecules (excerpts)
-
...1 0 1 ...11 0 100
A
A A
2-4.5
R
19-24
2-4.5
2-4.519-24
19-24
R
R
D
H P4.5-7
7-10 10-14
Journal of Chemical Information and Computer Sciences, 1999, 39 (3) : 569-74
Ligand pharmacophoric potential
-
X TP
≅
1
Ncompounds
……
……
…...
1.…………....M105491
Ncompounds
……
……
…...
1…..Adim1
…...
Adim
1
1.…………..M10549
Reducing fingerprint dimensionality
-
NMDA Receptor AntagLeukotriene AntagPAF AntagAngiotensin II Blocker Ca Channel Blocker K Channel Activator Substance P AntagACAT Inhibitor Cyclooxygenase InhibLipoxygenase Inhib
Journal of Chemical Information and Computer Sciences. 2000, 40 (1): 117-125.
Target classification using ligand pharmacophoric feature(MDDR9104 - 10 largest classes)
-
eScreen N ObsMin ObsMax ObsAvg ObsSD PredMin PredMax PredAvg PredSD q2
ABL 170 3.77 9.10 5.85 1.16 3.77 9.11 5.87 0.99 0.85 ADK 225 5.00 10.46 7.65 1.12 5.36 9.99 7.65 0.95 0.85 CDC2 119 3.82 8.40 5.88 0.95 4.08 8.03 5.88 0.78 0.80 CDK1_B 764 3.74 8.30 6.09 0.89 3.96 7.97 6.10 0.79 0.87 CDK2 105 3.71 8.28 6.19 1.15 4.20 8.54 6.20 0.87 0.76 CDK2_A 455 4.09 8.08 6.30 0.69 4.17 7.74 6.31 0.60 0.83 CDK2_E 623 4.39 8.07 6.50 0.70 4.57 7.83 6.50 0.61 0.85 CDK4_D1 656 3.82 9.00 6.19 0.87 3.86 8.83 6.21 0.79 0.87 CDK4_D2 33 3.82 8.80 5.99 1.55 3.51 8.70 5.98 1.40 0.94 CDK5 474 3.80 8.21 6.62 0.93 3.90 8.75 6.63 0.84 0.89 CDK5_p25 417 4.00 8.15 6.78 0.75 4.67 8.14 6.79 0.66 0.83 CDK5_p35 24 4.40 7.70 5.82 0.87 4.35 6.61 5.80 0.61 0.73 CSK 439 3.87 9.00 6.36 1.09 3.27 8.84 6.39 0.94 0.84 EGFR 932 3.80 10.47 6.85 1.35 3.83 10.13 6.87 1.21 0.88 EGFR_Substrate 27 4.22 6.43 5.61 0.58 4.48 6.04 5.61 0.51 0.83 ERBB2 189 3.92 9.00 6.63 1.39 3.67 8.78 6.64 1.30 0.93 FGFR 368 4.19 8.70 5.90 0.92 4.31 7.62 5.91 0.80 0.85 FLT1 103 4.59 8.52 6.37 0.59 5.17 8.07 6.37 0.47 0.76 FYN 30 4.08 8.30 5.71 1.17 4.24 7.09 5.69 0.97 0.83 Grb2_SH2 34 4.17 8.70 6.48 1.26 4.21 8.42 6.51 1.01 0.94 GSK3A 77 5.64 8.40 7.02 0.59 6.26 8.15 7.00 0.48 0.89 GSK3B 243 3.89 8.55 6.03 0.92 4.56 8.13 6.04 0.81 0.86 KDR 281 3.94 9.00 6.39 1.05 3.96 8.69 6.41 0.90 0.82 LCK 199 3.75 9.40 6.02 1.25 3.86 8.58 6.03 1.09 0.85 MAP2K 195 5.32 8.96 6.80 0.75 5.31 8.19 6.80 0.61 0.72 MAPK14 485 4.07 9.96 6.56 0.95 4.46 9.58 6.58 0.83 0.84 PDGFR 65 4.19 8.00 5.79 0.99 4.49 7.65 5.82 0.77 0.83 PDGFRB 390 4.03 7.51 5.78 0.68 4.38 7.32 5.79 0.53 0.74 PDK 59 3.82 8.48 5.69 1.28 4.14 7.92 5.74 1.06 0.80 PIK4CA 51 3.70 7.15 4.67 0.78 3.71 6.73 4.68 0.69 0.88 PKA 127 3.74 7.59 5.20 0.88 3.86 7.06 5.21 0.76 0.82 PKC 372 3.80 8.62 6.32 1.15 3.86 8.65 6.33 1.02 0.89 PPK 33 4.00 8.52 7.18 1.00 4.46 8.29 7.20 0.84 0.87 PRKCA 163 3.93 8.55 5.82 0.99 4.13 8.28 5.82 0.80 0.83 PRKCA_DAG 106 4.01 8.00 6.00 0.90 4.25 7.84 6.02 0.72 0.73 PRKCB1 134 3.92 8.70 6.88 1.13 3.68 8.21 6.90 1.01 0.90 PRKCB1_DAG 114 3.91 8.55 6.35 1.07 4.67 8.37 6.37 0.78 0.74 PRKCD 33 4.41 8.30 6.12 1.22 4.54 7.66 6.15 1.09 0.89 PRKCE_DAG 104 4.01 8.00 5.83 0.90 4.13 7.92 5.85 0.64 0.74 PRKCG_DAG 71 5.14 8.30 6.54 0.88 5.18 7.89 6.58 0.64 0.76 PRKCH_DAG 72 4.46 8.70 7.32 0.96 5.42 8.70 7.37 0.70 0.72 RAF1 78 4.01 8.40 5.84 1.04 4.58 8.15 5.83 0.89 0.75 SRC 210 3.72 9.74 6.61 1.74 3.96 9.56 6.64 1.65 0.96 SYK 28 4.30 8.44 6.55 1.69 4.27 8.32 6.61 1.44 0.87 TEK 32 4.75 7.23 6.02 0.54 5.04 7.18 6.01 0.43 0.78 ZAP_70_SH2 43 4.03 6.00 5.06 0.50 4.03 5.98 5.07 0.43 0.85
-
2239
237
895
2082
6709
2083
1018
360
2424
176
2424
175
2424
177
8066
2469
6603
1838
2711
2450
2428
2429
2447
1438
2424
729
6536 824
238
258
877
868
2239
268
6534 67
4499 58
1307
7208
7210
8570 66
7198
7172
1373
346
7839
4439
34 255
2517
2423
995
1739
608
4791
6737
1306 733
2511
2512 975
4788
2268
2262
8080
2270
2263
2272
2424
731
8674
4792 470
6576
2239
194
3222
4772
6609
4783
6581
P KAP KCP R KCAP R KCA_DAGP R KCB1P R KCB1_DAGP R KCDP R KCE_DAGP R KCG_DAGP R KCH_DAGCDC2CDK1_BCDK2CDK2_ACDK2_ECDK4_D1CDK4_D2CDK5CDK5_p25CDK5_p35GSK3AGSK3BMAP K14ABLCSKGrb2_SH2FYNLCKSRCSYKZAP _70_SH2RAF1MAP 2KADKP IK4CAP P KP DKEGFREGFR_SubstrateERBB2FGFRTEKFLT1KDRP DGFRP DGFRB
Sertanty KB moleculeID
ePotency and eSelectivity
8-9
7-8
6-7
5-6
4-5
3-4
eScreen scores for top-“kinase active” molecules against each of the 46 eScreen targets
-
Working towards “what you should make?”eTarget correlation using the top-5 most bioactive molecules for each target
(#mols:201, #eScreen-targets: 46)
-
ChIP – Simulation-3
• Start with ~ 40 generic reactions with “introspective” filters
• Generate virtual protocols by graph traversal algorithm based oncompatibility of generic representations
• Enumerate virtual protocols using commercially available starting materials as input
• Eliminate structures that overlap with LUCIA Knowledgebase (KB)
• Apply Lipinski and structural filters (> 90)• MWT, HBA/HBD, ClogP, TPSA, rotBond• reactive functionalities like alkylators / acylators, electrophiles,
nucleophiles, etc. • Undesired motifs (non-standard elements, >2 halo or >1 nitro
per aryl, thioesters / ureas, un-branched chains, etc.
Ø Excerpted set: 28K novel compounds
-
Simulation Results (sim-3)
Compound Novelty (ChIP sim-3 excerpt)
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Maximal compound similarity
Fre
qu
ency
KB "BioMol" (23K)MDDR (v2202.2) (65K)ACD (v2003.1) (45K)KB "BioMol+commercial" (850K)
-
N
NN
N
FF
NNN N
F FN
O
NN
O
N
NN
NNN N
N N N
NN
NNO N N
N N NN
N
NO
N
NN
NN
N
N
NN
NO
N
N
2270249 eABL: 7.96; eSRC: 6.83
2269216 eABL: 7.75; eSRC: 6.82
2280014 eABL: 7.36; eSRC: 7.12
2357124 eABL: 7.51; eSRC: 6.99
2267424 : eABL: 7.04; eSRC: 4.94
Excerpted ePotency & eSelectivity (sim3)
-
NCl
R
NO2 NH
R R NN
R
R
R
NO2
N
N
Cl
R
NN
R
R
R
NO2NN
R
R
R
NH2
NN
R
R
R
NH2
NN
R
R
R
NH
N
N
R
+
+
Excerpted generated strategies (sim3)
NH
N
OH
N
NN
N
N
N N
H
NH
N
OH
N
NN
N
NN
H
NH
N
NH
N
NN
N
ON
N
H
N
N
NH
NH
O NH
N
N N
N
Protocol-7011,7036,7005 (similar rxn sequence):
Protocol-7006,7036,7005 and Protocol-7007,7036,7005
[F,Cl]
NO2R
NH
R R N
R
R
NO2R
N
N
Cl
R
N
R
R
NO2R
N
R
R
NH2R
N
R
R
NH2R
N
R
R
NH
N
N
R
R
+
+
N
N
N
NN
N
O
N N
N
H
H
H
NS
N
N
NN
NNH
N
NH
H
N
N
NH
N
NH
NN
N
N
NN
N
N
N
S
N
N
N
NH
N
NNH
N
N
N
NNH
NNH
N
N
N
N
NN
O
NN N
N
H
N
N
Cl
N
NH
N
OH
N
NN
-
• More detailed scope of reaction and reaction compatibility
• More efficient survey of vProtocol space
vProtocol space is enormous:
e.g. 40 basis-set rxns ~105 million possibilities!
• Additional ePotency, eSelectivity and eADMEtox eScreens
• Partnership experimental validation
ChIP – next steps
∑hMaxRxnDept
ii
^tRxnsNumBasisSe
-
Leveraging GA strategies
“makelists” – A utility which generates sublists for each reactant in reaction basis-set, screening sets of commercially available molecules (e.g. ACD) through reaction-specific, “required,” “exclude,” and “introspective” filters.
“makeprimers” – A utility which builds a list of reactions that can act as primers - i.e. Reactions with sets of pre-filtered, commercially available starting materials for every reactant.
“makegene” – A utility which randomly selects from a “biased” list of named basis-set reaction smirks and generates N variable length "genes.“ A gene describes a vProtocol (e.g. r7069r7003r7036). All genes must start with primer reactions.
“evalgene” – A user-tailored, genetic algorithm called function which accepts a vProtocol-gene, enumerates and evaluates its “viability” (e.g. GA cost):
- Genes too small or too long have maximal cost
- Genes failing to enumerate final product molecules have maximal cost
- Product molecules passing post-enumeration filters (e.g. highly reactive, undesired motifs, etc) and Lipinski filters are scored with eScreens (e.g. ePotency, eSelectivity, eADMETox, etc.), novelty assessments, etc..
- Gene cost highly extensible.
-
N: 20 AVG: 7.12 SD: 0.82 MIN: 5.05 MAX: 8.20
1: 8.20 Cc1c(N)cccc1c2nc3nc(Nc4ncnc5c(csc45)c6ccc(Br)cc6)ncc3[nH]2
PCN: 3.(2.(MFCD04973922,1.(MFCD00127953,MFCD00082564)))
2: 8.15 Cl.Cl.Cc1c(N)cccc1c2nc3cc(Nc4ncnc5n(cc(c6ccccc6)c45)c7ccc(Br)cc7)ccc3[nH]2
PCN: 3.(2.(MFCD04971641,1.(MFCD00016619,MFCD00082564)))
3: 8.07 Cl.Cl.Cc1c(N)cccc1c2nc3cc(Nc4ncnc5c(csc45)c6ccc(Br)cc6)ccc3[nH]2
PCN: 3.(2.(MFCD04973922,1.(MFCD00016619,MFCD00082564)))
…
Sim-4d (GA excerpt)
r7069r7003r7036 -23.019
r7069r7005r7049 -17.958
r7003r7049r7036 -16.915
r7069r7005r7049r7069 -13.648
r7003r7036r7002 -10.198
r7005r7049r7036 -8.476
r7006r7069r7005 -4.165
r7002r7003r7049 -3.689
Notes: bblockSample: 20eScreenSample: 20MinDepth: 3MaxDepth: 5Lipinski: mwt [250,750]charge [-2,2]Rotbonds [0,8]HBA [0,5]HBD [0,7]
PostEnum filters: 90Score = 100-15*Max(eABL)
N
N N
N
N
H
H
H
H
HHOO
N+
O
ONNN
N NH
N+
OO
HH
Cl
N
N
S
Br
NNN
NN
S
BrN N
H
N+
OO
HNNNN
NN
S
BrN N
H
HH
H
+
r7069
r7003
r7036
MFCD00127953
MFCD04973922
MFCD00082564
-
N
N N
N
N
H
H
H
H
HHO
O
NOH
N
N NH
NN NO
HH H
O
O
N Cl
N
FFF
O
O
N N
N NH
NN NO
N
FF F
H H
O
Cl
O
O
N
N
FF F
N
O
N NH
NN
NO H
NN
N
N
HH
HH
O
NN
N
N
NNH
N
FFF
N NH
NN NO H
+r7069
r7005
r7049
r7069
MFCD03410215MFCD00127953
MFCD00068146
MFCD03424715
MFCD00127862
r7069r7005r7049r7069 eABL: 7.41 N: 20 AVG: 6.58 SD: 0.67 MIN: 5.37 MAX: 7.58
Sim-4d (GA excerpt)
-
ChIP strategic partnership program
• Early collaborative validation of the technology
• Participation in experimental and early proof of concept studies
• “Personalized” ChIP – developed around proprietary chemistries of collaborator
• Early access to ChIP technology
• Access to current LUCIA system (ChIP precursor)
-
Acknowledgements
Special Thanks:
National Institute of Standards and Technology (ATP program)
Steven Muskal, Ph.D.Chief Technology [email protected]
Sertanty, Inc.; 1735 N. First Street, Suite 102San Jose, CA 95112p: (760) 535-2885www.sertanty.com
Stephan Schurer
Prashant Tyagi
Sertanty’s development team
Speaker contact information:
-
Supplemental Slides
-
Reaction chemistry and SAR archive
• >10,000 HTS reactions with experimental procedures (>1,500 associated with heterocyclic Kinase inhibitors)
• >7,500 Assay protocols with detailed information
• >45 Kinase target SAR training sets
• >45 Predictive eScreenTM models
>90,000 SAR data points
>4 Million unique structures
>300,000 Unique kinase molecules
-
eHIA revisited
Observed v. Predicted HIA (WPSA, QPlogPow, QPlogS,BIPCaco)
00.10.20.30.40.50.60.70.80.9
1
0 0.2 0.4 0.6 0.8 1
% cHIA (/100)
% O
bse
rved
HIA
(/1
00)
CV (n=8; corr=0.86;ad=0.12)
Pred (m=11;corr=0.97;ad=0.06)
Train (o=67;corr=0.94;ad=0.07)
Observed and predicted human intestinal absorption using a newly developed neural network model. Data set published in (J. Chem. Inf. Comput. Sci. 1998, 38, 726-735) was used to generate a new, more computationally efficient model.
The GA-derived computational descriptors:
WPSA - The weakly polar component of the SASA* (halogens, P, and S)QPlogPow - The predicted log of the octanol/water partition coefficientQPlogS - The predicted aqueous solubility, log S. S in moles/liter is the concentration of the solute in a saturated solution in equilibrium with the crystalline solid.BIPCaco - The predicted apparent Caco-2 cell permeability in nm/sec -Boehringer-Ingelheim scale.
-
-3
-2
-1
0
1
2
3
4
5
6
-1 0 1 2 3 4 5
Predicted pLD50
Rep
orte
d pL
D50
A leave-one-out toxicity prediction performance simulation using the RefSim algorithm. 13645 examples of Oral, rat LD50 data, extracted from the Registry of Toxic Effects of Chemical Substances (RTECS), were considered in a leave-one-out simulation using the “RefSim” algorithm. Journal of Chemical Information and Computer Sciences. 2003 Sep-Oct; 43(5): 1673-8
eTox by RefSim
-
Mutagenicity Prediction by RefSim
4
66
3
8
0
10
20
30
40
50
60
70
-1 0 1 NoPred
Pred-Observed
Freq
uenc
y
Data From:Handbook of Carcinogenic Potency and Genotoxicity Databases (Lois Swirsky, Gold and Errol Ziegler, Eds.), CRC Press, 1997, pp. 694-724Data Files contain the experimental mutagenicity data in 1 or 0's. 742 Reference set examples; 81 Test set examples; Refsim cutoff: 0.5One indicates a mutagen and zero indicates that the compound has no mutagenic effect on Salmonella typhimurium bacteria (AMES test).
eMutagenicity by RefSim
-
Sertanty Inc.
Roy ThieleRoy Thiele--SardiSardiññaaChairman
Steve Muskal, Ph.D.Steve Muskal, Ph.D.Chief Technology Officer
Hung T. Tran Hung T. Tran Chief Financial Officer
Kristina Kristina KurzKurz, Ph.D., Ph.D.Director, Business Development
MDL, Affymax / Glaxo, Libraria Inc.
Thieme Publishing, Libraria Inc.
Murdock & Associates, Healthtrac Inc. Tiburon Inc., Acterna, Tektronix, Libraria Inc.
Tasmania, Brocade, Sun Microsystems