a universal deep learning framework based on graph neural

Supporting Information

A Universal Deep Learning Framework based

on Graph Neural Network for Virtual Co-

Crystal Screening

Yuanyuan Jiang a, Jiali Guo a, Yijing Liu b, Yanzhi Guo a,

Menglong Lia, Xuemei Pu a,*

a College of Chemistry, Sichuan University, Chengdu, 610064

b College of Computer Science, Sichuan University, Chengdu, 610064

* Corresponding Author

Xuemei Pu ([email protected])

mailto:[email protected]

1. Construction of several machine learning models as the controls

DNN constructed by our work contains 6 full-connected layers, as shown by

Figure S1. Excepting for the final output layer, batch normalization1 and ReLU2 are

applied in each layer.

Figure S1. Architecture of DNN.

NCI1 is a spatial-based graph convolution network from Felipe et al3, where key

components are three Graph-CNN layers and two Graph Embedding Pooling (GEP)

layers, as depicted by left of Figure S2. The two kinds of layers perform the message

passing phase. The readout phase is a flattening operation. Details of Graph-CNN see

Methods. Here we mainly introduce the GEP layer as shown in right of Figure S1.

Figure S2. Architecture of NCI1.

Like pooling layers in conventional CNNs, the GEP layer is used to reduce

dimensions of the input, which eliminates redundant information and also improves

performance of computation. GEP transfers a graph with node number 𝑁 to a given

number 𝑁′. For this purpose, an embedding matrix 𝑿𝑒𝑚𝑏 ∈ ℝ𝑁×𝑁′ is produced by a

filter tensor 𝑯𝑒𝑚𝑏 ∈ ℝ𝑁×𝑁×𝐶×𝑁′. The calculation of 𝑿𝑒𝑚𝑏 is similar to the multiple

filters Graph-CNN (vide Methods), where the learnable filter 𝑯𝑒𝑚𝑏 is multiplied by

the node features 𝑿𝑖𝑛. It is defined by equations (S1-S2):

𝑿𝑒𝑚𝑏

(𝑛′)= ∑ 𝑯𝑒𝑚𝑏

(𝑐,𝑛′)𝑿𝑖𝑛

(𝑐)+ 𝑏

𝐶

𝑐=1

(S1)

𝑿𝑒𝑚𝑏 = softmax(GConv𝑒𝑚𝑏(𝑿𝑖𝑛, 𝑁′) + 𝒃) (S2)

where 𝑯𝑒𝑚𝑏(𝑐,𝑛′)

∈ ℝ𝑁×𝑁 is a part of 𝑯𝑒𝑚𝑏. 𝑿𝑒𝑚𝑏(𝑛′)

is a column of 𝑿𝑒𝑚𝑏 ∈ ℝ𝑁×𝑁′. The

pooled graph data will be calculated by the next operations (vide equations (S3-S4)).

𝑿𝑜𝑢𝑡 = 𝑿𝑒𝑚𝑏𝑇 𝑿𝑖𝑛 (S3)

𝑨𝑜𝑢𝑡 = 𝑿𝑒𝑚𝑏𝑇 𝑨𝑖𝑛𝑿𝑒𝑚𝑏 (S4)

where 𝑨𝑖𝑛 ∈ ℝ𝑁×𝑁 is adjacency matrix, 𝑨𝑜𝑢𝑡 ∈ ℝ𝑁′×𝑁′ is pooled adjacency matrix.

𝑿𝑜𝑢𝑡 ∈ ℝ𝑁′×𝐶 is pooled node feature matrix. Finally, a pooled graph that has 𝑨𝑜𝑢𝑡

and 𝑿𝑜𝑢𝑡 is produced by GEP.

enn-s2s, proposed by Gilmer et al4, has two phases (a message passing phase and

a readout phase, as shown in Figure S3. In Gilmer’s work, enn-s2s is a regression

model. Here, in order to extend its application to the classification prediction of

cocrystal formation, we modified the architecture of enn-s2s by changing the dimension

of output layer. The message passing phase includes two functions, i.e., message

passing function and update function. The message passing function is used to

propagate node features, as reflected by equation (S5).

𝒙𝑖𝑡 = 𝑾𝒙𝑖

𝑡−1 + ∑ 𝒙𝑗𝑡−1 ∙ 𝐌𝐋𝐏(𝒆𝑖,𝑗)

𝑗∈𝒩(𝑖)

(S5)

Where 𝒙𝑖𝑡 is the feature of node i in t-th time step, W is trainable weights, 𝒩(𝑖) is

the adjacent nodes of node i, 𝒆𝑖,𝑗 is the feature of edge between node i and j. MLP is

multi-layer perceptron.

Figure S3. Architecture of enn-s2s.

The update function used to update node features is Gated Recurrent Unit (GRU)5, as

described by equation (S6)

𝒉𝑖𝑡 = 𝐺𝑅𝑈(𝒉𝑖

𝑡−1, 𝒙𝑖𝑡) (S6)

Where 𝒉𝑖𝑡 is the hidden state of node i in t-th time step.

For the readout phase, a feature vector for the whole graph is computed by enn-s2s

that is based on iterative content-based attention from Vinyals et al6 (vide equations S7-

S10)

𝒒𝑡 = LSTM(𝒒𝑡−1∗ ) (S7)

𝛼𝑖,𝑡 =exp(𝒙𝑖 ∙ 𝒒𝑡)

∑ exp (𝒙𝑗 ∙ 𝒒𝑡)𝑗∈𝑮 (S8)

𝒓𝑡 = ∑ 𝛼𝑖,𝑡

𝑁

𝑖=1

𝒙𝑖 (S9)

𝒒𝑡∗ = 𝒒𝑡 ∥ 𝒓𝑡 (S10)

where i indexes through each node feature vector 𝒙𝑖 , 𝒒𝑡 is a query vector which

allows us to read 𝒓𝑡 from the memories at t-th time step, 𝛼𝑖,𝑡 is attention coefficient

of node i at t-th time step, and LSTM is Long Short-Term Memory7 which calculates a

recurrent state. 𝑮 is the graph to which node i and j belong. 𝑁 is the number of nodes

in graph 𝑮. ∥ is concatenation. t is the step index, which is the number of times that

the state is computed. The maximum of t is 3 in this work. After the three steps, 𝒒𝑡∗ is

the feature vector for the whole graph to be fed to classifier that is two dense layers.

CCGNet-simple are proposed in this work in order to observe the impact of

different feature integration operation, where the message passing phase is three Graph-

CNN layers (vide Methods) and readout function is multi-head global attention (vide

Methods) with 10 heads. After the global attention, the global state U is fused into the

graph embedding.

Figure S4. Architecture of CCGNet-simple.

2. More examples for the attention visualization

Figure S5. Attention Visualization of BAFGEX.

Figure S6. Attention Visualization of VIHKUU.

Figure S7. Attention Visualization of BARMIM.

Figure S8. Attention Visualization of EFUCEP.

Figure S9. Attention Visualization of MAQZEK.

Table S3. Solvents involved in collecting cocrystal positive samples from Cambridge

Structural Database.

Toluene 4-Chlorotoluene diglyme

DMSO-d6 1,3,5-trichlorobenzene iodobenzene

trichloromethane-d gamma-Butyrolactone 1,1,2-trichloroethane

ethoxyethane DL-sec-Butyl acetate formic acid

methylamine iodomethane dimethyl sulfoxide

p-Xylene methanamide 3-methyl-1-butanol

1-butanol Tetrahydrofuran bromobenzene

cyclohexanone chlorobenzene dimethoxymethane

1H-pyrrole Ethyl formate 2-butanone

2-butanol isobutanol N-Ethylmorpholine

1,1,2,2-tetrachloroethane N, N, N', N'-Tetramethylethylenediamine propan-2-ol

1,4-dioxane Ethanol 2-methyl-2-propanol

2-methylpyridine 3-methylpyridine 2-butoxyethanol

diethylenetriamine 2-methoxyethanol dibromomethane

1-methyl-2-pyrrolidone N, N-dimethylacetamide 2,2'-Dichlorodiethyl ether

Methyl acetate cyclopentane benzyl alcohol

benzene hexadecane water-d2

nitromethane hexamethyldisiloxane Hexane

1-Chloro-2-Methylpropane acetic anhydride propanenitrile

acetamide acetic acid Ethylene glycol

Diethylene glycol Isopropyl acetate Isopropyl ether

tetrachloromethane acetone acetophenone

nitrobenzene propionic acid 1,2-Propanediol

pentane 1,1-Dichloroethane butane-1,4-diol

1,3-dimethylbenzene 1,2-dihydrostilbene N, N-diethylethanamine

tribromomethane 2-propoxyethanol 1,2-Dichloroethane

1-propanol water phenylamine

heptane trichloromethane pyridine

cyclohexene cyclohexane Methanol

1,2-dimethoxyethane 3-pentanone fluorobenzene

epichlorohydrin acetonitrile dichloromethane

methanedithione 1-Octanol butanedioic acid

N, N-dimethylformamide 1,2-ethanediamine 2,4-pentanedione

o-Xylene Propylene glycol monomethyl ether acetate 1,3,5-trimethylbenzene

2-phenylacetonitrile 2-Chlorotoluene 1,2-dichlorobenzene

isophorone morpholine nitric acid

quinoline benzonitrile ethyl acetate

benzene-d6

Table S4. Performances of various models with different feature compositions for the

valid set of 10-fold cross validation.

Model PACC (%) NACC (%) BACC (%)

SVM 98.99 (±0.39) 87.55 (±2.72) 93.27 (±1.44)

RF 99.89(±0.06) 91.00 (±2.70) 95.44 (±1.34)

DNN 99.53(±0.29) 90.46(±2.34) 95.00(±1.07)

NCI1 99.01(±0.50) 85.96(±3.56) 92.49(±1.63)

enn-s2s 98.44(±0.45) 86.96(±3.68) 92.70(±1.76)

CCGNet-simple 99.46(±0.45) 93.45(±2.45) 96.46(±1.05)

CCGNet 99.89(±0.13) 96.98(±2.20) 98.43(±1.12)

Table S5. Refcodes of energetic cocrystals collected from CSD for the out-of-

distribution prediction

ABTNBA01 ABUNIU AJAKOL ANCTNB APANBZ

BIYXAL BIZZAO BNZTNB BZATNB20 CAZTBZ01

CBZTNB CECPEF CEZFOF DIFZOK DUKBOC

DUKBUI DUKCAP ERAFAE FETYAE FONHOH

FONJAV FUFSOQ GEXMAZ GEXMED GEXMIH

GEXMON HECREM HETTIM HETTOS HETTUY

HIVGAW HUZSEA IZUZUZ IZUZUZ01 JABYIX

JABYOD JOCTAZ KIZVAQ KOBFIQ KUMYOI

LOKJIH LUTGUD MAAZNB NIBJUF NIBZAM

NIKLOL NILCET POCVIP POSREV PUBMUU20

PUBWEO PUTWEI PUTWIM PUTWOS PUTWUY

PUTXAF PUTXEJ PVVBFD01 PVVBKP01 PYRTNB

QAPNAZ QARQUY QINLEH QOSRUN QOWBEJ

REDCIM REDCUY REDDAF REDDEJ RENPUV

RULLUF RUYKUR RUYLAY RUYLEC SERZIB

SKTNIB SOQPAQ STINBZ SUGCAY TETTAQ

TIVJUF TOZMUS UGUNAN URIHUZ URIJEL

URIJUB URILAJ USEZID VAZBIJ VIGKIF

VIGKUR VIGLEC WEPGEG WEPTAP WOJWIB

WOJWOH WOJXEY XAHZAH XAJJUQ XEMCID

XIZCER YEDVAH ZEBJOH01 ZEGKIF10 ZEVNUL

ZEZGIW ZEZHET ZILMUF ZOPGOC ZUBNOB

ZUBNUH ZZZAGS10 YOJQOG YOJXIH YOJXON

NILCIX ZEBJOH WOSFOB PEHSUS XAQFUS

ZASWAT ZASWEX ZASWIB GOWHIL ROSMOD

ROSMIX JAQVOP UWUGAW JABYIX MANLEV

BOXTET WUGWAY WIFYAN WIFXUG IDENEM

ZEZGOC ZEZHAP ZEZHOD URIJAH URIJIP

URIMAK URILOX URIKOW URILEN URIKUC

URIKIQ URIJOV URIKEM URIKAI UTEJAG

MEPWIQ FOYSUJ

Reference

1. Ioffe, S.; Szegedy, C., Batch Normalization: Accelerating Deep Network Training

by Reducing Internal Covariate Shift. 2015.

2. Battaglia, P. W.; Hamrick, J. B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.;

Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R., Relational

inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261

2018.

3. Such, F. P.; Sah, S.; Dominguez, M. A.; Pillai, S.; Zhang, C.; Michael, A.; Cahill,

N. D.; Ptucha, R., Robust Spatial Filtering With Graph Convolutional Neural Networks.

IEEE J. Sel. Top. Signal Process. 2017, 11 (6), 884-896.

4. Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E., Neural message

passing for quantum chemistry. arXiv preprint arXiv:1704.01212 2017.

5. Cho, K.; Van Merrienboer, B.; Bahdanau, D.; Bengio, Y., On the Properties of

Neural Machine Translation: Encoder-Decoder Approaches. Computer ence 2014.

6. Vinyals, O.; Bengio, S.; Kudlur, M., Order Matters: Sequence to sequence for sets.

Computer ence 2015.

7. Hochreiter, S.; Schmidhuber, J., Long Short-Term Memory. Neural Computation

1997, 9 (8), 1735-1780.

a universal deep learning framework based on graph neural

Documents