exploring simple, interpretable, and predictive qspr model

16
DOI: 10.4018/JNN.2017010103 Journal of Nanotoxicology and Nanomedicine Volume 2 • Issue 1 • January-June 2017 Copyright©2017,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited. ABSTRACT Buckminsterfullerene(C 60 )anditsderivativeshavecurrentlybeenusedaspromisingnanomaterial for diagnostic and therapeutic agents. They are applied in pharmaceutical industry due to their nanostructurecharacteristics,stabilityandhydrophobiccharacter.Duetoitssparinglysolublenature, thesolubilityofC 60 hasbeenofenormousattentionamongcarbonnanostructureinvestigatorsowing to its fundamental importance and practical interest in nanotechnology and medical industry. In ordertostudythediverseroleofC 60 anditsderivativesthedependenceoffullerene’ssolubilityon molecularstructureofthesolventmustbeunderstood.Currentstudywasdedicatedtotheexploration ofthesolubilityoffullereneC 60 in156organicsolventsusingsimple,interpretableandpredictive 1D and 2D descriptors employing quantitative structure-property relationship (QSPR) technique. Theauthorsemployedgeneticalgorithmfollowedbymultiplelinearregressionanalysis(GA-MLR) togeneratethecorrelationmodels.Thebestperformanceisaccomplishedbythefour-variableMLR modelwithinternalandexternalpredictioncoefficientofQ 2 =0.86andR 2 pred =0.89.Thestudy identifiedvitalpropertiesandstructuralfragments,particularlyvaluableforguidingfuturesynthetic aswellaspredictionefforts.Themodelgeneratedwiththehighestnumberoforganicsolventstodate withsimpledescriptorscanbereproducedinnotimetopredictthesolubilityofC 60 inanynewor existingorganicsolvents.Thisapproachcanbeusedasanefficientpredictorforfullerenes’solubility invariousorganicsolvents. KeywoRdS C 60 , Chemometrics, Fullerene, GA, MLR, QSPR, Solubility exploring Simple, Interpretable, and Predictive QSPR Model of Fullerene C 60 Solubility in organic Solvents Lyudvig S. Petrosyan, Department of Physics, Jackson State University, Jackson, MS, USA Supratik Kar, Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, USA Jerzy Leszczynski, Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, USA Bakhtiyor Rasulev, Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, USA 1. INTRodUCTIoN Fullerene,ahighlysymmetricalcage-likemoleculehasspecificinteractionwithorganicsolventsand itsknowledgecanprovidesignificantinformationonthemechanismsofsolute-solventinteractions. Thefullereneshavedefinedrigidgeometriesindistinctiontoothersoluteswhoseshapesundergo conformationalchanges.Notonlythatintramolecularvibrationalpartitionfunctionsmayundergo solvent-dependent changes (Prylutskyy et al., 2003). Due to sparingly soluble nature of C 60 in majororganicsolvents,theproductioncostisstillhighforthisnanomaterial(Shunaevetal.,2015). Therefore,understandingoffullerene’ssolubilityprovidessignificantfeatureassistinginpurification, 28

Upload: others

Post on 21-Jul-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: exploring Simple, Interpretable, and Predictive QSPR Model

DOI: 10.4018/JNN.2017010103

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

Copyright©2017,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.

ABSTRACT

Buckminsterfullerene(C60)anditsderivativeshavecurrentlybeenusedaspromisingnanomaterialfor diagnostic and therapeutic agents. They are applied in pharmaceutical industry due to theirnanostructurecharacteristics,stabilityandhydrophobiccharacter.Duetoitssparinglysolublenature,thesolubilityofC60hasbeenofenormousattentionamongcarbonnanostructureinvestigatorsowingto its fundamental importance andpractical interest innanotechnology andmedical industry. InordertostudythediverseroleofC60anditsderivativesthedependenceoffullerene’ssolubilityonmolecularstructureofthesolventmustbeunderstood.CurrentstudywasdedicatedtotheexplorationofthesolubilityoffullereneC60in156organicsolventsusingsimple,interpretableandpredictive1Dand2Ddescriptorsemployingquantitativestructure-propertyrelationship(QSPR) technique.Theauthorsemployedgeneticalgorithmfollowedbymultiplelinearregressionanalysis(GA-MLR)togeneratethecorrelationmodels.Thebestperformanceisaccomplishedbythefour-variableMLRmodelwithinternalandexternalpredictioncoefficientofQ2=0.86andR2

pred=0.89.Thestudyidentifiedvitalpropertiesandstructuralfragments,particularlyvaluableforguidingfuturesyntheticaswellaspredictionefforts.ThemodelgeneratedwiththehighestnumberoforganicsolventstodatewithsimpledescriptorscanbereproducedinnotimetopredictthesolubilityofC60inanyneworexistingorganicsolvents.Thisapproachcanbeusedasanefficientpredictorforfullerenes’solubilityinvariousorganicsolvents.

KeywoRdSC60, Chemometrics, Fullerene, GA, MLR, QSPR, Solubility

exploring Simple, Interpretable, and Predictive QSPR Model of Fullerene C60 Solubility in organic SolventsLyudvig S. Petrosyan, Department of Physics, Jackson State University, Jackson, MS, USA

Supratik Kar, Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, USA

Jerzy Leszczynski, Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, USA

Bakhtiyor Rasulev, Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, USA

1. INTRodUCTIoN

Fullerene,ahighlysymmetricalcage-likemoleculehasspecificinteractionwithorganicsolventsanditsknowledgecanprovidesignificantinformationonthemechanismsofsolute-solventinteractions.Thefullereneshavedefinedrigidgeometriesindistinctiontoothersoluteswhoseshapesundergoconformationalchanges.Notonlythatintramolecularvibrationalpartitionfunctionsmayundergosolvent-dependent changes (Prylutskyy et al., 2003). Due to sparingly soluble nature of C60 inmajororganicsolvents,theproductioncostisstillhighforthisnanomaterial(Shunaevetal.,2015).Therefore,understandingoffullerene’ssolubilityprovidessignificantfeatureassistinginpurification,

28

Page 2: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

29

extraction,bioavailability,reactivity,andriskassessmentoffullerenes.Thisinformationisvitalduetoampleapplicationofcarbonnanostructures,suchasC60anditsderivativesindiverseaspectsofnanotechnology,pharmaceuticals,cosmetic,medicinalchemistry,environmentalapplications(Cooketal.,2010;Bogatu,&Leszczynska,2016)andmaterialsscience(Gharagheizi&Alamdari,2008;Sivaramanetal.,2001).

Quantitativestructure-property relationship (QSPR)representsapowerful tool formodelingandpredictionofphysiochemicalproperties.TheQSPRmethodisdefinedonthefoundationofamathematicalalgorithmprovidingarationalbasisforestablishingapredictivecorrelationmodel.Apartfromprovidingamathematicalcorrelation,italsoenablestheexplorationofchemicalfeaturesencodedwithinparameters(descriptors)(Roy,Kar,&Das,2015a;Toropova,2016).Hence,diversesetofdescriptorsplaysanoteworthyroleintherecognitionaswellasanalysisofthechemicalbasisinvolvedinastudiedphenomenon.Therefore,reliableQSPRmodelcanoffertimeandcost-effectivemeasureofC60solubilityvaluesinorganicsolventsintheabsenceofexperimentaldata.

AseriesofinvestigationforpredictingC60solubilityinorganicsolventsemployingQSPRmodelhasbeenreportedinthelast12years.Liuetal.(2005)generatedalinearmodelaswellasaleast-squaressupportvectormachine(LSSVM)modelforpredictingthesolubilityofC60in128and122organicsolvents,respectively.Toropovetal.(2007,2009)demonstratedtwokindsofdescriptorsmethodsforpredictingsolubilityofC60indifferentorganicsolvents.Samedatasetwasusedtobuildone-variablemodeloncewith theoptimaldescriptorscalculatedwithsimplifiedmolecular inputlineentrysystem(SMILES)(Toropovetal.,2007)andinanotherworkwithInternationalChemicalIdentifier(InChI)(Toropovetal.,2009)withhighstatisticalresults.Petrovaetal.(2011)depictedsuccessful application of the GA-MLR technique in combination with quantum-chemical andtopologicaldescriptorsyieldsreliablefour-variablemodelsfor122organicsolvents.OneGA-MLRmodelwasdevelopedtopredictthefullerenesolubilityin36benzenederivativesbyPourbasheeretal.(2011).Ghasemietal.(2013)proposedfirst3D-QSARmodelemployingVolSurfbaseddescriptorswithSPA-SVM(successiveprojection algorithm-support vectormachine)method to predictC60solubilityin132organicsolventswithacceptablestatisticalresults.Inrecenttime,Xuetal.(2016)proposedaQSPRmodelforpredictingthesolubilityoffullereneC60in156diverseorganicsolventswiththenormindexes.

In this regard, we aimed to find simple, predictive, computationally time-efficient andmechanisticallyinterpretablemodeltopredictthesolubilityofC60inthesamesetoforganicsolventsconsideredbyXuetal.(2016).Inaddition,thestudyintendstoestimatepredictivepotentialofthesimple1Dand2DdescriptorstomodelthesolubilityofthefullereneC60inalargenumberoforganicsolvents.

2. MATeRIALS ANd MeTHodS

2.1. data SetTheexperimentalsolubility(S)dataofC60in156organicsolvents(Table1)werecollectedfromtwodatasets:BeckandMándi(1997)andSemenovetal.(2010).Asthelogarithmicvaluesofmolarfractionscorrespondedtothefreeenergychangesinthesolvationprocess,theunitofsolubilitywasconsideredaslogS,insteadofweightunits(e.g.,mg/mL).

Page 3: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

30

2.2. descriptors CalculationSolventstructureswerepreparedinHyperChem8softwarepackage(HyperChem(TM))andsavedas.molextensionfile.Thereafter,DRAGON6(DRAGON,TALETEsrl,Italy)softwareemployedtogenerate apoolofdescriptors to correlatewith logS followedby findingbest featureswhichare responsible for C60 solubility in organic solvents. A total of 319 descriptors generated fromconstitutionalindices,topologicalindices,walk-pathcount,connectivityindices,functionalgroupcounts,ETAindices,Atom-centeredfragments,Atom-typeE-stateindices&molecularpropertieshavebeenconsidered.Detailsaboutthedescriptorsisdiscussedinthefollowinglink:http://www.talete.mi.it/products/dragon_molecular_descriptor_list.pdf

2.3. Model developmentTheinitialdatasetwasdividedintotrainingandtestsetbasedonsortedexperimentalpropertylogSresponsevaluefromlowertohigher.Then,every2ndmoleculefromthefoursolventsconsideredinthetestsetfromthesortedcolumnandtheprocesscarriedoutforthewholedataset.Therefore,datasetdividedinto3:1ratiowith117and39solventsinthetrainingandtestset,respectively.Then,geneticalgorithm(GA)wasemployedasthedescriptorselectionstatisticaltoolimplementedintheGenetic Algorithm 1.4 software package (http://teqip.jdvu.ac.in/QSAR_Tools/). We applied GAtoselectonlythebestcombinationsofdescriptorsforbuildingmodelswiththehighestpredictivepowerofsolubility.Thenthemultiple linearregression(MLR)analysiswasperformedbyMLRPlusValidationGUI1.2software(http://teqip.jdvu.ac.in/QSAR_Tools/),followedbyvalidationofthemodelusingthetestsetcompounds.Overall,thecombinedGA-MLRtechniquewasutilizedtoselecttheappropriatedescriptorsandtogeneratedifferentQSPRmodelsselectingthebestmodelswithvariablesintherangefrom1to6.

2.4. Model ValidationAsetofstringentstatisticalmetricswereutilizedtomakesurethefitnessofthein silicomodelsthrough internalandexternalvalidationmethodologies.Thegoodness-of-fitof theequationwascheckedbyregressioncoefficient(R2),aswellasbyusingthefollowinginternalvalidationmetrics:theleave-one-outcrossvalidation(Q2

LOO),andexternalvalidationmetric(R2pred).Therm

2metrics(Royetal.,2012;Roy&Kar,2015a),GolbraikhandTropsha’scriteriawerealsocheckedforeachmodel(Roy&Kar,2015a).Wealsocheckedthepredictionqualityofallthemodelsintermsofthemeanabsoluteerror(MAE)-basedcriteria(Royetal.,2016).

TherobustnessofthemodelswascheckedbasedontheprocessofY-randomizationtechniquegenerating100randommodelsviashufflingthedependentvariablewhilemaintainingtheoriginalindependentvariables.ThecRp

2parameter(Roy&Kar,2015a)wascheckedwhichmustbemorethan0.5foracceptableQSPRmodel.AsstatedinOECDprinciples,reliableandtransparentQSPRmodelshouldhaveadefinedapplicabilitydomain(AD)(Roy&Kar,2015b).Thus,theADofthebestQSPRmodelwascheckedemployingtwotechniques:a)thestandardizationbasedtechnique(Roy,Kar,&Ambure,2015b)andb)theEuclideandistanceapproach(http://teqip.jdvu.ac.in/QSAR_Tools/).Theprincipleforstandardizationbasedtechniqueisasfollows:Accordingtothisdistribution,99.7%ofthepopulationwillremainwithintherangemean±3standarddeviation(SD).Thus,mean±3SDrepresentsthezonewheremostofthetrainingsetcompoundsbelongto.Anycompoundoutsidethiszoneissignificantlydifferentfromtherestandmajorityofthecompounds.Anytestcompoundoutsideofthelimitisconsideredasanoutlier.IncaseofEuclideandistanceapproach,itisbasedondistancescorescalculatedbytheEuclideandistancenorms.Atfirst,normalizedmeandistancescorefortrainingsetcompoundsarecalculatedandthesevaluesrangesfrom0to1(0=leastdiverse,1=mostdiversetrainingsetcompound).Thennormalizedmeandistancescorefortestsetarecalculated,andthosetestcompoundswithscoreoutside0to1rangearesaidtobeoutsidetheapplicabilitydomain.Thiscanalsobecheckedbyplottinga‘Scatterplot’(normalizedmeandistancevs.respectiveactivity/property)includingbothtrainingandtestset.

Page 4: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

31

Table 1. Experimental and predicted (calculated according to equation (4)) fullerene C60 solubility for studied organic solvents

ID. CAS no. Solvent name logS (Exp) logS (Cal) Residual

Training set

1 109-66-0 Pentane -6.10 -5.80 -0.30

2 110-54-3 Hexane -5.10 -5.41 0.31

4 26635-64-3 Isooctane -5.20 -4.71 -0.49

5 124-18-5 Decane -4.70 -4.32 -0.38

6 112-40-3 Dodecane -3.50 -3.92 0.42

7 493-01-6 cis-Decahydronaphthalene -3.30 -3.59 0.29

8 493-02-7 trans-Decahydronaphthalene -3.50 -3.59 0.09

10 542-18-7 Cyclohexylchloride -4.10 -3.99 -0.11

12 626-62-0 Cyclohexyliodide -2.80 -3.03 0.23

14 110-83-8 Cyclohexene -3.80 -4.31 0.51

15 108-87-2 Methylcyclohexane -4.50 -4.34 -0.16

17 75-09-2 Dichloromethane -4.60 -5.58 0.98

18 56-23-5 Carbontetrachloride -4.40 -4.24 -0.16

19 74-95-3 Dibromomethane -4.50 -4.34 -0.16

22 74-97-5 Bromochloromethane -4.20 -4.93 0.73

24 75-03-6 Iodoethane -4.50 -4.54 0.04

25 79-34-5 1,1,2,2-Tetrachloroethane -3.10 -3.39 0.29

27 71-55-6 1,1,1-Trichloroethane -4.70 -4.69 -0.01

28 540-54-5 1-Chloropropane -5.60 -5.65 0.05

30 75-29-6 2-Chloropropane -5.90 -5.58 -0.32

31 75-26-3 2-Bromopropane -5.40 -4.96 -0.44

32 75-30-9 2-Iodopropane -4.80 -4.16 -0.64

33 78-87-5 1,2-Dichloropropane -4.90 -4.67 -0.23

34 142-28-9 1,3-Dichloropropane -4.80 -4.98 0.18

35 78-75-1 1,2-Dibromopropane -4.30 -3.66 -0.64

36 627-31-6 1,3-Diiodopropane -3.40 -3.14 -0.26

40 513-38-2 1-Iodo-2-methylpropane -4.30 -3.87 -0.43

41 507-19-7 2-Bromo-2-methylpropane -5.00 -5.14 0.14

43 127-18-4 Tetrachloroethylene -3.80 -3.24 -0.56

44 513-37-1 1-Chloro-2-methylpropene -4.50 -4.95 0.45

45 71-43-2 Benzene -4.00 -3.82 -0.18

46 95-47-6 1,2-Dimethylbenzene -2.90 -3.53 0.63

47 108-38-3 1,3-Dimethylbenzene -3.30 -3.61 0.31

49 95-63-6 1,2,4-Trimethylbenzene -2.50 -3.38 0.88

50 108-67-8 1,3,5-Trimethylbenzene -3.50 -3.50 0.00

51 527-53-7 1,2,3,5-Tetramethylbenzene -2.40 -3.25 0.85

52 119-64-2 Tetralin -2.50 -2.58 0.08

53 103-65-1 N-propylbenzene -3.50 -3.67 0.17

55 104-51-8 n-Butylbenzene -3.40 -3.55 0.15

57 462-06-6 Fluorobenzene -4.10 -3.87 -0.23

58 108-90-7 Chlorobenzene -3.00 -3.41 0.41

59 108-86-1 Bromobenzene -3.30 -3.01 -0.29

60 95-50-1 1,2-Dichlorobenzene -2.40 -2.90 0.50

62 694-80-4 1-Bromo-2-chloro-benzene -2.40 -2.55 0.15

64 120-82-1 1,2,4-Trichlorobenzene -2.80 -2.57 -0.23

65 100-42-5 Styrene -3.20 -3.53 0.33

68 100-66-3 Anisole -3.10 -3.98 0.88

69 100-52-7 Benzaldehyde -4.20 -3.69 -0.51

70 103-71-9 Phenylisocyanate -3.40 -3.61 0.21

72 108-98-5 Thiophenol -3.00 -3.53 0.53

continued on following page

Page 5: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

32

ID. CAS no. Solvent name logS (Exp) logS (Cal) Residual

73 100-39-0 Benzylbromide -3.10 -3.04 -0.06

74 30583-33-6 Trichlorotoluene -3.00 -2.62 -0.38

75 90-12-0 1-Methylnaphthalene -2.20 -2.24 0.04

76 28804-88-8 Dimethylnaphthalene -2.10 -2.19 0.09

77 605-02-7 1-Phenylnaphthalene -1.90 -1.34 -0.56

80 71-41-0 1-Pentanol -5.30 -5.64 0.34

81 67-64-1 Acetone -7.00 -6.26 -0.74

82 68-12-2 N,N-Dimethylformamide -5.30 -5.86 0.56

83 110-01-0 Tetrahydrothiophene -5.40 -4.51 -0.89

84 110-02-1 Thiophene -4.40 -3.99 -0.41

85 554-14-3 2-Methylthiophene -3.00 -3.92 0.92

86 872-50-4 N-methyl-2-pyrrolidone -3.90 -4.39 0.49

88 91-22-5 Quinoline -2.90 -2.31 -0.59

89 62-53-3 Aniline -3.90 -3.98 0.08

90 100-61-8 N-methylaniline -3.80 -3.86 0.06

94 110-82-7 Cyclohexane -5.30 -4.53 -0.77

95 591-49-1 1-Methyl-1-cyclohexene -3.80 -4.15 0.35

96 2207-01-4 cis-1,2-Dimethylcyclohexane -4.60 -4.05 -0.55

97 1678-91-7 Ethylcyclohexane -4.30 -4.24 -0.06

98 67-66-3 Chloroform -4.80 -4.57 -0.23

99 106-93-4 1,2-Dibromoethane -4.20 -4.02 -0.18

100 106-94-5 1-Bromopropane -5.20 -5.03 -0.17

101 109-64-8 1,3-Dibromopropane -4.20 -4.15 -0.05

102 78-77-3 1-Bromo-2-methylpropane -4.90 -4.59 -0.31

103 507-20-0 2-Chloro-2-methylpropane -5.70 -5.66 -0.04

104 558-17-8 2-Iodo-2-methylpropane -4.40 -4.49 0.09

105 79-01-6 Trichloroethylene -3.80 -4.06 0.26

106 108-88-3 Toluene -3.40 -3.74 0.34

107 106-42-3 1,4-Dimethylbenzene -3.30 -3.51 0.21

108 488-23-3 1,2,3,4-Tetramethylbenzene -2.90 -3.16 0.26

110 135-98-8 sec-Butylbenzene -3.60 -3.41 -0.19

111 591-50-4 Iodobenzene -3.50 -2.47 -1.03

112 541-73-1 1,3-Dichlorobenzene -3.40 -3.06 -0.34

113 583-53-9 1,2-Dibromobenzene -2.60 -2.20 -0.40

114 88-72-2 2-Nitrotoluene -3.40 -3.55 0.15

115 100-44-7 Benzylchloride -3.40 -3.40 0.00

117 2586-62-1 1-Bromo-2-methylnapthalene -2.10 -1.72 -0.38

118 71-23-8 1-Propanol -6.40 -6.53 0.13

119 111-27-3 1-Hexanol -5.10 -5.29 0.19

120 111-87-5 1-Octanol -5.00 -4.69 -0.31

121 107-13-1 Acrylonitrile -6.40 -5.98 -0.42

122 111-96-6 2-Methoxyethylether -5.20 -5.21 0.01

124 79-00-5 1,1,2-Trichloroethane -4.78 -4.22 -0.56

126 629-04-9 Bromoheptane -3.30 -4.87 1.57

127 111-83-1 Bromooctane -3.09 -3.84 0.75

128 112-71-0 1-Bromotetradecane -2.59 -2.89 0.30

129 112-89-0 1-Bromooctadecane -2.53 -2.51 -0.02

130 67-56-1 Methanol -8.87 -8.26 -0.61

132 112-30-1 Decan-1-ol -4.15 -4.24 0.09

133 112-42-5 Undecan-1-ol -3.99 -4.04 0.05

Table 1. Continued

continued on following page

Page 6: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

33

ID. CAS no. Solvent name logS (Exp) logS (Cal) Residual

134 67-63-0 Propan-2-ol -6.65 -6.46 -0.19

135 78-92-2 Butan-2-ol -6.34 -5.90 -0.44

136 6032-29-7 Pentan-2-ol -5.57 -5.59 0.02

138 504-63-2 1,3-Propanediol -7.05 -6.36 -0.69

141 102-04-5 1,3-Diphenylacetone -3.40 -1.99 -1.41

143 2398-37-0 m-Bromoanisole -2.55 -3.15 0.60

144 98-07-7 1,1,1-Trichloromethylbenzene -3.02 -2.82 -0.20

145 573-98-8 1,2-Dimethylnaphthalene -2.12 -2.19 0.07

146 75-05-8 Acetonitrile -7.54 -6.83 -0.71

147 109-99-9 Tetrahydrofuran -5.17 -5.20 0.03

148 108-75-8 2,4,6-Trimethylpyridine -2.80 -3.63 0.83

150 79-09-4 Propionicacid -5.79 -6.02 0.23

151 107-92-6 Butanoicacid -5.74 -5.67 -0.07

152 109-52-4 Pentanoicacid -5.05 -5.26 0.21

153 142-62-1 Hexanoicacid -4.50 -4.94 0.44

154 111-14-8 Heptanoicacid -4.26 -4.66 0.40

156 112-05-0 Nonanoicacid -4.41 -4.17 -0.24

Test set

3 111-65-9 Octane -5.20 -4.80 -0.40

9 137-43-9 Cyclopentylbromide -4.20 -3.75 -0.45

11 108-85-0 Cyclohexylbromide -3.40 -3.58 0.18

13 5401-62-7 1,2-Dibromocyclohexane -2.60 -3.03 0.43

16 6876-23-9 trans-1,2-Dimethylcyclohexane -4.60 -4.05 -0.55

20 75-25-2 Bromoform -3.20 -3.03 -0.17

21 74-88-4 Iodomethane -4.20 -4.89 0.69

23 74-96-4 Bromoethane -5.20 -5.48 0.28

26 107-06-2 1,2-Dichloroethane -5.00 -5.13 0.13

29 107-08-4 1-Iodopropane -4.60 -4.23 -0.37

37 96-11-7 1,2,3-Tribromopropane -2.90 -2.92 0.02

38 96-18-4 1,2,3-Trichloropropane -4.00 -4.10 0.10

39 513-36-0 1-Chloro-2-methylpropane -5.40 -5.13 -0.27

42 540-49-8 1,2-Dibromoethylene -3.70 -3.83 0.13

48 526-73-8 1,2,3-Trimethylbenzene -3.10 -3.37 0.27

54 98-82-8 iso-Propylbenzene -3.60 -3.47 -0.13

56 98-06-6 tert-Butylbenzene -3.70 -3.77 0.07

61 108-36-1 1,3-Dibromobenzene -2.60 -2.49 -0.11

63 108-37-2 1-Bromo-3-chloro-benzene -3.00 -2.76 -0.24

66 98-95-3 Nitrobenzene -3.90 -3.68 -0.22

67 100-47-0 Benzonitrile -4.20 -3.64 -0.56

71 99-08-1 3-Nitrotoluene -3.40 -3.48 0.08

78 64-17-5 Ethanol -7.10 -6.82 -0.28

79 71-36-3 1-Butanol -5.90 -6.07 0.17

87 110-86-1 Pyridine -4.00 -4.01 0.01

91 121-69-7 N,N-Dimethylaniline -3.20 -3.54 0.34

92 4904-61-4 1,5,9-Cyclododecatriene -2.70 -2.71 0.01

93 629-59-4 Tetradecane -4.30 -3.59 -0.71

109 100-41-4 Ethylbenzene -3.40 -3.71 0.31

116 90-13-1 1-Chloronaphthalene -2.00 -2.03 0.03

123 111-84-2 Nonane -4.92 -4.55 -0.37

125 109-65-9 Bromobutane -3.74 -4.87 1.13

131 143-08-8 Nonan-1-ol -4.29 -4.45 0.16

Table 1. Continued

continued on following page

Page 7: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

34

3. ReSULTS ANd dISCUSSIoN

3.1. Statistical Performance and Validation Quality of QSAR ModelsTofindstatisticallyacceptedaswellasreliablemodels,wecalculateddifferentmodelsbyincreasingnumberofdescriptorsuntilthemodelattainthepeakofcorrelationcoefficientforinternalaswellasexternalvalidation.Thefour-variablemodel(Equation(4))yieldsthebestpredictivepotentialfor the solubilityofC60 inorganic solvents,which is strongly supportedby comparativeplot ofcorrelationcoefficientforindividualmodel(Figure1).Figure1showsthatexternalpredictionqualitydroppeddownfor5and6descriptorsmodelsincomparingto4-descriptormodel.Onthecontrary,theinternalvalidationstatisticsarealmostcomparableforallthreemodels.Thus,similarinternalpredictionqualityandbetterexternalpredictioncanbeachievedwith4-descriptormodel than5and6descriptorsmodels.Inaddition,asthenumberofdescriptorsincreases,thecomplexityofthemodelincreasesalongwiththeinterpretationcapability.Therefore,wehavechosenequation(4)asthebestmodeloverequations(5)and(6).Collinearityischeckedamongmodeleddescriptorsthroughpearsoncorrelation.Statisticaloutcomesofonetosix-variablemodelsareillustratedinTable2.Thedevelopedmodelequationsareasfollow:

One-variablemodel:

log( ) . ( . ) - . ( . ) _ _S Psi e A = ± ± ×9 513 0 952 1 582 0 110 (1)

Two-variablemodel:

log( ) - . ( . ) . ( . )

. ( . )

S PDI

O

= ± + ± ×+ ± ×

14 639 0 546 11 427 0 692

0 444 0 087 NN1 (2)

Three-variablemodel:

log(S) = -13.637( 0.509)+0.318( 0.039) X0v

+0.575( 0.371) N

± ± ×± × ssssN+9.556( 0.692) PDI± ×

(3)

Four-variablemodel:

log( ) - . ( . ) . ( . )

. ( . )

S X sol = ± + ± ×+ ±

16 199 0 760 0 423 0 056 0

11 825 1 01 ×× ± ×± ×

PDI VAR

X sol

- . ( . )

- . ( . )

0 021 0 009

0 303 0 088 4

(4)

Five-variablemodel:

ID. CAS no. Solvent name logS (Exp) logS (Cal) Residual

137 584-02-1 Pentan-3-ol -5.36 -5.50 0.14

139 110-63-4 1,4-Butanediol -6.57 -5.87 -0.70

140 111-29-5 1,5-Pentanediol -6.19 -5.49 -0.70

142 104-92-7 p-Bromoanisole -2.54 -3.03 0.49

149 64-19-7 Aceticacid -6.27 -6.63 0.36

155 124-07-2 Octanoicacid -4.98 -4.40 -0.58

Table 1. Continued

Page 8: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

35

log( ) - . ( . ) - . ( . )

. ( . )

S BBI = ± ± ×

+ ± ×

13 648 0 646 0 011 0 025

5 539 0 497α∑∑

± ×

+ ± × + ±N

SAaccV

s

- . ( . )

. ( . ) . ( . )'

0 016 0 003

4 061 0 876 2 205 0 233

β ××piPC 01

(5)

Six-variablemodel:

log( ) - . ( . ) . ( . )

. ( . )

S AMW

pi

= ± + ± ×+ ± ×

8 591 0 235 0 063 0 005

1 001 0 117 PPC Cl

NsssN

02 0 681 0 188 086

0 329 0 315 0 015 0 00

+ ± ×+ ± × ±

. ( . ) -

. ( . ) - . ( . 33

0 653 0 093 03

)

. ( . )

×+ ± ×

SAacc

MPC

(6)

As the trainingsetcomposition is fixed, theremaybeabias inselectionof thedescriptors.Therefore,wehavealsoperformeddoublecross-validationstrategy(Roy&Ambure,2016)tomakesureour4descriptorsmodelisbestorwecangetanyotherbettermodel.Incaseofdouble-crossvalidation,bestmodelobtainedbyConsensusModelPredictionsamong threemethods (methodparametersandresultareprovidedinthesupplementarymaterial)underdoublecrossvalidationinsoftwarementionedunderhttp://teqip.jdvu.ac.in/QSAR_Tools/.Now,ifwecompareourpreviousbestmodelandbestmodelevolvedfromthedoublecrossvalidation,ourprevious4descriptorsGA-MLR

Figure 1. Comparative plot of correlation coefficient values for individual model with one to six variables

Page 9: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

36

modelevolvedasthebestmodelbasedonregression-basedmetricsaswellaserrorbasedmetrics.Followedbybias inprediction (Roy,Ambure&Aher,2017) isalsochecked for thebestmodel(Equation(4))andacceptedresultsareasfollowed:Variance:0.0087,Bias2:0.160,Variance+Bias2:0.169,Meansquareerror(MSE):0.159.

Graphicalillustrationoftheexperimentalandpredictedsolubility(SeeTable1)accordingtothebestmodel(Equation(4))forthetrainingandtestsetsisshowninFigure2.Allmodelsweredevelopedwithwellexplainableandtransparentdescriptors.ThelackofchancecorrelationintheQSPRmodelsisalsowellreflectedfromthevalueofcR2

pwhichishigherthantheacceptablethresholdvalueof0.5forallsixmodels.

Thereafter,wehaveperformedtheADstudywithtwodifferentapproachestomakesurethebestmodelisreliableandrobust.EmployingtheeuclideandistancebasedADdeterminationtechnique,notestcompoundsarefoundoutsidetheAD.IncaseofthestandardizationADvalidationapproach,onecompoundfromthetestset(ID:93,Tetradecane)wasfoundtolieoutsidethedomaindefinedbythetrainingset(Figure3).Thus,thecollectiveoutcomesattainedfromtheADstudiesindicatethatoutof39solventsinthetestset,only1solventisoutsidetheADanditspredictionisnotreliable.TheADstudyconcludesthat,wecanassertivelypredictthesolubilityof97.4%ofthetestsolvents.

Figure 2. Scatter plot for the best four-variable model

Page 10: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

37

Tabl

e 2. S

tatis

tical

outc

ome f

or th

e sele

cted

mod

els w

ith o

ne to

six d

escr

ipto

rs

Mod

elIn

tern

al v

alid

atio

nEx

tern

al v

alid

atio

nM

AE

base

d cr

iteri

a

Ran

dom

izat

ion

GTC

R2

Q2 LO

OR

MSE

cr m

(LO

O)S

cale

d2

”rm

(LO

O)S

cale

d2

Rpred2

RM

SEp

r m(t

est)

Scal

ed2

”rm

(tes

t)Sc

aled

2c R

p2

10.

640.

630.

762

0.49

0.27

0.68

0.68

40.

510.

26M

oder

ate

0.64

Pass

20.

790.

780.

579

0.69

0.17

0.81

0.52

60.

660.

18G

ood

0.79

Pass

30.

840.

830.

507

0.76

0.14

0.89

0.39

80.

810.

10G

ood

0.83

Pass

40.

870.

860.

461

0.79

0.12

0.89

0.40

00.

820.

10G

ood

0.85

Pass

50.

880.

860.

447

0.80

0.11

0.87

0.43

90.

820.

00G

ood

0.85

Pass

60.

890.

870.

422

0.82

0.10

0.86

0.41

60.

840.

01G

ood

0.86

Pass

a Good

pred

iction

s: MA

E ≤

0.1 ×

train

ing se

t ran

ge an

d MAE

+ 3σ

≤0.2

× tr

aining

set r

ange

. Bad

pred

iction

s: MA

E >

0.15 ×

train

ing se

t ran

ge or

MAE

+ 3σ

> 0.

25 ×

train

ing se

t ran

ge. H

ere,

MAE

is the

aver

age a

bsolu

te er

ror a

nd th

e σ

value

deno

tes th

e stan

dard

devia

tion o

f the a

bsolu

te er

ror v

alues

for t

he te

st se

t data

. GTC

-Golb

raikh

and T

rops

ha’s

criter

ia

Page 11: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

38

3.2. Mechanistic Interpretation of the Best ModelThesuccessandacceptabilityofanyQSPRmodelreliesonitsmechanisticinterpretationassuggestedinOECDprinciple5.Thebestfour-variablemodelisconsideredforinterpretationinthepresentsection.Thecontributionofthedescriptors(Figure4)tothesolubilityofC60accordingtotheequation(4)ismaintainedthefollowingorder:PDI>X0sol>X4sol>VAR.

ThemostimportantfeatureofthebestmodelisPDI,amolecularpropertydefinedaspackingdensityindex(Todeschini,&Consonni,2000).ThePDIcanbeexplainedastheratiobetweentheMcGowanvolume(Vx)andthetotalsurfaceareafromP_VSA-likedescriptors.TheMcGowanvolume(Vx,mL/mol)istheactualvolumeofamolewhenthemoleculesarenotinmotion(McGowan,1978).Itisproportionaltotheparachor(Vp)whichisthemolecularvolumeattemperatureswhensurfacetensionsareequaltothefollowingequation:

V Mgd dpl g

=−( )

14

(7)

Where,Misthemolecularweightofaliquid,gisitssurfacetension,dlisitsdensity,anddgisthedensityofthevaporinequilibriumwiththeliquid.TheP_VSAdescriptorsdefinedastheamountofvanderWaalssurfacearea(VSA)havingthecomputedpropertyinacertainrange(Labute,2000).Assuggestedbyequation(4)andFigure4,ithashighestpositivecontributiontosolubilityofC60indiverseorganicsolvents.SecondandthirdmodelsalsoidentifiedPDIasimportantpropertytocorrelateasolubilityresponse.HigherpackingdensityindexvalueresultedinthebettersolubilityofC60inthestudiedsolvents.

Figure 3. Euclidean-AD plot for the best four variable model

Page 12: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

39

X0solandX4solarethesolvationconnectivityindexoforderchi-0andchi-4thatencodesthesolvationpropertyofthecompound(Todeschini&Consonni,2000).Thesedescriptorsaredefinedinordertomodelsolvationentropyanddispersioninteractionsinsolutionandrelatethecharacteristicdimensionofthemoleculetotheatomicparameters.Bothdescriptorscanbedefinedbythefollowingequations(Antipinetal.,1991).Ifthecharacteristicdimensionsofthemoleculesbyatomicparametersaretakenintoaccount,theyaredefinedas:

X solL

m qs

ma k

a

k

a ka

n=

+

=

=

1

2

1

1

1 2

1

( )

( ) /δ

+∑k

k

1

(8)

where,Laistheprincipalquantumnumber(2forC,N,Oatoms,3forSi,S,Clandetc.)ofthea-thatominthek-thsubgraph;δaisthecorrespondingvertexdegree;kisthetotalnumberofm-thordersubgraphsandnisthenumberofverticesinthesubgraph.

Interestingly,X0solhasmuchhighercontributiontowardssolubilitypropertythanX4sol.Again,X0solandX4solhavepositiveandnegativecontributions,respectively.ThepositivecoefficientofX0solindicatesthatanincreaseinthedescriptorvalueresultsinanincreaseinsolubilityandnegativecoefficientofX4solindicatesthatanincreaseinthedescriptorvalueresultsindecreaseinsolubilityofC60intheconsideredsolvent.Therefore,solvationpropertywith0orderconnectivityindexisbetterforsolubilityofC60inthestudiedorganicsolventsthansolvationof4orderconnectivityindex.Itworthtonote,thatthedescriptorofthesameclass(X1sol)wasselectedasthebestinoneofthepreviousstudies,performedbyPetrovaetal.(2011).

TheleastcontributingdescriptorforthemodelisVARwhichisatopologicalindexexplainingthevariationofsolventsstructure(Todeschini&Consonni,2000),hasanegativecontributiontowardssolubilityofC60inorganicsolvents.Assuggestedbyitsnegativecoefficient,lowerdescriptorvaluewillhelptoincreasethesolubilityforC60.Overall,amongallmodelsgeneratedthefour-variablemodelisasuperiorcandidateforfurtheruseforC60solubilitypredictions.Ithasagoodpredictivepower,transparentdescriptorsandincludesonlyeasy-to-calculateones.

Figure 4. Descriptors’ contribution plot for the four-variable model

Page 13: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

40

3.3. Comparison with developed ModelInordertoevaluateourmodel’spredictiveability,acomparativeanalysiswasperformedwithexistingmodelsforC60solubilityinorganicsolvents(Liuetal.,2005;Toropovetal.,2007;Toropovetal.,2009;Petrovaetal.,2011;Pourbasheeretal.,2011;Ghasemi,Salahinejad,&Rofouei,2013;Xuetal.2016),andresultswerelistedinTable3.Comparingwithallmodels,Xuetal.(2016)andourpresentstudyconsideredhighestnumberofsolventsasdatapointsandlinearmethodisusedformodelgenerationinbothcases.Thoughalmostsimilarstatisticaldataobtainedfrombothmodels,currentstudyoutperformthemodelofXuetal.(2016)byusingmuchlessnumberofdescriptors.Thepresentmodelexplainswellthesolubilitycorrelationwithonlyfourinterpretabledescriptorsforsuchalargeanddiversedatasetwithhighacceptablestatisticalresult.

4. CoNCLUSIoN

ThisstudyshowsthatanapplicationoftheGA-MLRtechniqueemployingsimpleandtransparentdescriptorsyields reliable,predictiveand interpretablemodels.Althoughwehavecompared theoutcomeofonetosixvariablemodels,thebestperformanceisaccomplishedbythefour-variablemodel.Interestingly,onetothree-variablemodelscanbeexplainedbyinformationprovidedbysmallnumberofvariablesinthemodel,andforfivetosix-variablemodelsitisduetoattainsaturationofthecorrelationandpredictionpointalready.AmongallthedevelopedmodelsforsolubilitypredictionofC60 todate,currentmodelemployedhighestnumberoforganicsolventswithleastnumberofdescriptors providing satisfactory prediction results along with mechanistic interpretation. ThedevelopedmodelscanbeemployedproficientlyforfuturepredictionsoffullereneC60solubilityinvariousorganicsolventsalongwithdeepunderstandingofthisphenomenon.

CoNFLICT oF INTeReST

Theauthorsconfirmthatthisarticlehasnoconflictsofinterest.

ACKNowLedGMeNT

L.S.P,S.K.andJ.L.wanttothanktheNationalScienceFoundation(NSF/CRESTHRD-1547754,andNSFRISEHRD-1547836)forfinancialsupport.B.RwantstoacknowledgethesupportfromtheNationalScienceFoundationunderNDEPSCoRAward#IIA-1355466andbytheStateofNorthDakota.

Table 3. A comparative table for current and existing logS prediction models based on statistical parameters

Work reference Model Data points

No. of variables

R2 Q2 R2pred

Liuetal(2005) Nonlinearmodel(LSSVM) 128 - 0.91 0.91 0.90Toropovetal.(2007) One-variablemodel(SMILES

baseddescriptor)122 1 0.86 0.85 0.88

Toropovetal.(2009) Onevariablemodel(InChI-baseddescriptor)

122 1 0.94 0.94 0.93

Petrovaetal.(2011) GA-MLR 122 4 0.86 0.84 0.90Pourbasheeretal.(2011) GA-MLR 36 5 0.87 0.80 0.81Ghasemi,Salahinejad,&Rofouei,(2013)

SPA-SVM(VolSurfmethod) 132 5 0.88 0.72 0.87

Xuetal.(2016) MLR 156 10 0.88 0.89 0.93Present study GA-MLR 156 4 0.87 0.86 0.89

Page 14: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

41

ReFeReNCeS

AntipinI.S.,ArslanovN.A.,PalyulinV.A.,KonovalovA.I.,&ZefirovN.S.(1991)Solvationtopologicalindex.Topologicaldescriptionofdispersioninteraction(inRussian).Doklady Akademii Nauk SSSR, 316,925–927.

Beck,M.T.,&Mándi,G.(1997).SolubilityofC.Fullerenes, Nanotubes and Carbon Nanostructures,5(2),291–310.doi:10.1080/15363839708011993

Bogatu,C.,&Leszczynska,D.(2016).TransformationofNanomaterialsinEnvironment:SurfaceActivationofSWCNTsduringDisinfectionofWaterwithChlorine.Journal of Nanotoxicology and Nanomedicine,1(1),45–57.doi:10.4018/JNN.2016010104

Cook,S.M.,Aker,W.G.,Rasulev,B.,Hwang,H.M.,Leszczynski,J.,Jenkins,J.J.,&Shockley,V.(2010).ChoosingsafedispersingmediaforC60fullerenesbyusingcytotoxicitytestsonthebacteriumEscherichiacoli.Journal of Hazardous Materials,176(1-3),367–373.doi:10.1016/j.jhazmat.2009.11.039PMID:19962827

Gharagheizi, F. R., & Alamdari, F. (2008). A molecular-basedmodel for prediction of solubility ofC60 fullerene in various solvents. Fullerenes, Nanotubes and Carbon Nanostructures, 16(1), 40–57.doi:10.1080/15363830701779315

Ghasemi, J.B.,Salahinejad,M.,&Rofouei,M.K. (2013).Alignment independent3D-QSARmodelingoffullerene (C60) solubility indifferentorganic solvents.Fullerenes, Nanotubes and Carbon Nanostructures,21(5),367–380.doi:10.1080/1536383X.2011.629751

HyperChem(TM);1115NW4thStreet,Gainesville,Florida32601,USA.

Labute,P.A.(2000).Widelyapplicablesetofdescriptors.Journal of Molecular Graphics & Modelling,18(4-5),464–477.doi:10.1016/S1093-3263(00)00068-1PMID:11143563

Liu, H., Yao, X., Zhang, R., Liu, M., Hu, Z., & Fan, B. (2005). Accurate quantitative structure-propertyrelationshipmodeltopredictthesolubilityofC60invarioussolventsbasedonanovelapproachusingaleast-squaressupportvectormachine.The Journal of Physical Chemistry B,109(43),20565–20571.doi:10.1021/jp052223nPMID:16853662

McGowan,J.C.(1978).EstimatesofthePropertiesofliquids.Journal of Applied Chemistry and Biotechnology,28,599–607.doi:10.1002/jctb.5700280902

Petrova,T.,Rasulev,B.F.,Toropov,A.A.,Leszczynska,D.,&Leszczynski,J.(2011).ImprovedmodelforfullereneC60solubilityinorganicsolventsbasedonquantum-chemicalandtopologicaldescriptors.Journal of Nanoparticle Research,13(8),3235–3247.doi:10.1007/s11051-011-0238-x

Pourbasheer,E.,Riahi,S.,Ganjali,M.,&Norouzi,R.P.(2011).PredictionofsolubilityoffullereneC60invariousorganicsolventsbygeneticalgorithm-multiplelinearregression.Fullerenes, Nanotubes and Carbon Nanostructures,19(7),585–598.doi:10.1080/1536383X.2010.504952

Prylutskyy, Y. I., Yashchuk, V. M., Kushnir, K. M., Golub, A. A., Kudrenko, V. A., Prylutska, S. V., &Matyshevska, O. P. et  al. (2003). Biophysical studies of fullerene-based composite for bionanotechnology.Materials Science and Engineering: C, 23(1-2),109–111.doi:10.1016/S0928-4931(02)00244-8

Roy, K., & Ambure, P. (2016). The double cross-validation tool for MLR QSAR model development.Chemometrics and Intelligent Laboratory Systems,159,108–126.doi:10.1016/j.chemolab.2016.10.009

Roy,K.,Ambure,P.,&Aher,R.(2017).HowimportantistodetectsystematicerrorinpredictionsandunderstandstatisticalapplicabilitydomainofQSARmodels?Chemometrics and Intelligent Laboratory Systems,162,44–54.doi:10.1016/j.chemolab.2017.01.010

Roy,K.,Das,R.N.,Ambure,P.,&Aher,R.B.(2016).Beawareoferrormeasures.FurtherstudiesonvalidationofpredictiveQSARmodels.Chemometrics and Intelligent Laboratory Systems,152, 18–33.doi:10.1016/j.chemolab.2016.01.008

Roy,K.,&Kar,S.(2015a).HowtoJudgePredictiveQualityofClassificationandRegressionBasedQSARModels.InZ.U.Haq&J.Madura(Eds.),Frontiers of Computational Chemistry(pp.71–120).Bentham;doi:10.2174/9781608059782115020005

Page 15: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

42

Roy,K.,&Kar,S.(2015b).Importance of applicability domain of QSAR models. Quantitative Structure–Activity Relationships in Drug Design, Predictive Toxicology, and Risk Assessment (pp.180–211).PA: IGIGlobal;doi:10.4018/978-1-4666-8136-1.ch005

Roy,K.,Kar,S.,&Ambure,P.(2015b).OnasimpleapproachfordeterminingapplicabilitydomainofQSARmodels.Chemometrics and Intelligent Laboratory Systems,145,22–29.doi:10.1016/j.chemolab.2015.04.013

Roy,K.,Kar,S.,&Das,R.N.(2015a).Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment.AcademicPress.

Roy,K.,Mitra, I.,Kar,S.,Ojha,P.,Das,R.N.,&Kabir,H. (2012).Comparativestudiesonsomemetricsfor external validation of QSPR models. Journal of Chemical Information and Modeling, 52(2), 396–408.doi:10.1021/ci200520gPMID:22201416

Semenov,K.N.,Charykov,N.A.,Keskinov,V.A.,Piartman,A.K.,Blokhin,A.A.,&Kopyrin,A.A.(2010).Solubility of light fullerenes in organic solvents. Journal of Chemical & Engineering Data, 55(1), 13–36.doi:10.1021/je900296s

Shunaev,V.V.,Savostyanov,G.V.,Slepchenkov,M.M.,&Glukhova,O.E.(2015).Phenomenonofcurrentoccurrence during themotion of aC60 fullerene on substrate-supported graphene. RSC Advances, 5(105),86337–86346.doi:10.1039/C5RA12202C

Sivaraman,N.,Srinivasan,T.,VasudevaRao,P.,&Natarajan,R. (2001).QSPRmodeling for solubilityoffullerene(C60)inorganicsolvents.Journal of Chemical Information and Computer Sciences,41(4),1067–1074.doi:10.1021/ci010003aPMID:11500126

Talete.Dragonver.6.(n.d.).Retrievedfromhttp://www.talete.mi.it/products/dragon_molecular_descriptors.htm

Todeschini, R., & Consonni, V. (2000). Handbook of molecular descriptors. Weinheim: Wiley-VCH.doi:10.1002/9783527613106

Toropov,A.A.,Rasulev,B.F.,Leszczynska,D.,&Leszczynski,J.(2007).AdditiveSMILESbasedoptimaldescriptors:QSPRmodelingoffullereneC60solubilityinorganicsolvents.Chemical Physics Letters, 444,209–214.doi:.07.02410.1016/j.cplett.2007

Toropov,A.A.,Toropova,A.P.,Benfenati,E.,Leszczynska,D.,&Leszczynski,J.(2009).AdditiveInChI-basedoptimaldescriptors:QSPRmodelingoffullereneC60solubilityinorganicsolvents.Journal of Mathematical Chemistry,46(4),1232–1251.doi:10.1007/s10910-008-9514-0

Toropova,A.P.,Achary,P.G.R.,&Toropov,A.A. (2016).Quasi-SMILESfornano-QSARpredictionoftoxiceffectofAl2O3nanoparticles.Journal of Nanotoxicology and Nanomedicine,1(1),17–28.doi:10.4018/JNN.2016010102

Xu,X.,Yan,L.,Jia,Q.,Wang,Q.,&Peisheng,M.(2016).PredictingsolubilityoffullereneC60indiverseorganicsolventsusingnormindexes.Journal of Molecular Liquids,223,603–610.doi:10.1016/j.molliq.2016.08.085

Page 16: exploring Simple, Interpretable, and Predictive QSPR Model

Journal of Nanotoxicology and NanomedicineVolume 2 • Issue 1 • January-June 2017

43

Lyudvig Petrosyan is theoretical physicist in condensed matter and quantum physics. His current position is a post-doctoral research associate and adjunct professor at the Physics department at Jackson State University, Mississippi, USA. He received his PhD degree in Condensed matter physics in 2002 from Yerevan State University, Armenia. Dr. Petrosyan has a diverse range of research experience in condensed matter physics, however his main topics of interest are, from a broad point of view, electronic and optical properties of low dimensional nanostructures and the understanding of resonant tunneling effects in quantum nanostructures. He has over 15 years of educator’s experience in the US and Armenia. He has published more than 40 research and review articles, including 2 textbooks. During last 2 years one of Dr. Petrosyan’s scientific interests is computational statistics and machine learning methods, particularly materials informatics and cheminformatics, including structure-activity relationship studies, dealing with biological activity and physico-chemical properties prediction, including nanoparticles and polymers. In computational statistics Dr. Petrosyan has close collaboration with North Dakota State University (USA), Ernest Mario School of Pharmacy at Rutgers University (USA) and The Focus Foundation (USA).

Supratik Kar has been a post-doctoral research associate at the Interdisciplinary Center for Nanotoxicity at Jackson State University, Mississippi, USA in Prof. Jerzy Leszczynski research group since April 2015. He has completed his BPharm. (Gold Medallist) (2008) and MPharm. (Gold Medallist) (2010) degree from Jadavpur University securing first position in both degrees. He has earned his PhD (2015) from the Department of Pharmaceutical Technology, Jadavpur University (Kolkata, India) under the guidance of Prof. Kunal Roy. He is a former visiting researcher at the University of Gdańsk (Gdansk, Poland) under the Marie Curie International Research Staff Exchange Scheme in Prof. Tomsz Puzyn’s group (http://nanobridges.eu/supratik-kar-ju-in-university-of-gdansk-ug-in-gdansk/). He has eight years of experience in QSAR and chemometric modeling studies. He researches a range of topics in structure-activity relationship studies, dealing with biological activity prediction of natural compounds, organic compounds and toxicity prediction of various chemicals, including nanoparticles. He is actively associated with modeling of power conversion efficient solar cell design through molecular modeling and quantum studies. He has published 41 research and review articles, 5 book chapters till date (http://orcid.org/0000-0002-9411-2091). He has also coauthored QSAR related books entitled “Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment” (Elsevier, 2015) and “A Primer on QSAR/QSPR Modeling: Fundamental Concepts” (Springer, 2015). His current h-index is 16 and i-10 index is 18 according to Scopus. He serves as an associate editor of the International Journal of Quantitative Structure-Property Relationships (IJQSPR) [IGI-Global publishers]. He has acted as a reviewer for many reputed journals like Molecular diversity (Springer), Nanoscale (RSC), Science of the Total Environment (Elsevier), Structural Chemistry (Springer), Energies (MDPI), Journal of Nanotoxicology and Nanomedicine (IGI).

Bakhtiyor Rasulev is a professor at Department of Coatings and Polymeric Materials (North Dakota State University). He received his PhD degree in Chemistry in 2002 from Uzbek Academy of Sciences. Dr. Rasulev’s scientific interests are in cheminformatics and structure-activity relationship studies, dealing with biological activity prediction, physico-chemical and toxicity prediction of chemicals, including nanoparticles and polymers. He is an author of many contributions devoted to QSAR modeling and quantum-chemical applications. Dr. Rasulev has close collaboration with the Instituto di Ricerche Farmacologiche Mario Negri (Italy), Jackson State University (USA), University of Zagreb (Croatia), Johns Hopkins University (USA) and etc. His accomplishments have been widely recognized. He is a permanent reviewer of more than 20 peer-reviewed journals. Dr. Rasulev has received many scholarships and awards, including Scholarship of Drew University (Residential School of Medicinal Chemistry, Madison, NJ), Young Investigators Award from Toxicological Division of ACS, award from CRDF Foundation, UNESCO Scholarship to visit the Institute of Desert Study of Ben-Gurion University, Israel.