smiles new 1
DESCRIPTION
SMILES noteTRANSCRIPT
-
SMILESSimplified Molecular Input Line Entry System (SMILES)Widely used AND computationally efficientUses atomic symbols and a set of intuitive rulesUses hydrogen-suppressed molecular graphs (HSMG)
-
SMILES BondsSINGLE*
DOUBLE
TRIPLE
AROMATIC** can be omitted-
=
#
:
-
Butanols2-Butanol
iso-Butanol
tert-Butanol
-
SMILES BranchesRepresented by enclosure in parenthesesCan be nested or stackedExamples:
CC(O)CC is 2-ButanolOCC(C)C is iso-ButanolOC(C)(C)C is tert-Butanol
-
SMILES BondsEtheneChloroethene1,1-Dichloroethenecis-1,2-DichloroetheneTrichloroethenePerchloroetheneC=CClC=CClC(Cl)=CClC=CClClC(Cl)=CClClC(Cl)=C(Cl)Cl
-
SMILES AtomsUse normal chemical symbolsAdd punctuation symbols if necessaryNo super- or subscripts
-
SMILES SymbolsString of alphanumeric characters and certain punctuation symbolsTerminates at the first space encountered when read left to rightThe ORGANIC SUBSET:
B, C, N, O, P, S, F, Cl, Br, I
-
Other SMILES AtomsAliphatic or nonaromatic carbon: CAtom in aromatic ring: lowercase letterDesignate ring closure with pairs of matching digits, e.g.
c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereasC1CCCCC1 is Cyclohexane
-
SMILES ChargesSpecify attached hydrogens and charges in square bracketsNumber of attached hydrogens is the symbol H followed by optional digit
-
SMILES Charges[H+][OH-][OH3+][Fe++][NH4+]protonhydroxyl anionhydronium cationiron(II) cationammonium cation
-
SMILES Cyclic StructuresBreak one single or one aromatic bond in each ringNumber in any orderDesignate ring-breaking atoms by the same digit following the atomic symbol
-
Cyclic StructuresNumbers indicate start and stop of ringSame number indicates start and end of the ring, entered immediately following the start/end atomsOnly numbers 1 9 are usedA number should appear only twiceAtom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2
-
Naphthalene
c12ccccc1cccc2
-
SMILES ConventionsAvoid two consecutive left parentheses if possibleStrive for the fewest number of possible branchesTautomeric bonds are not designated; enter the appropriate form
-
Further RestrictionsA branch cannot begin a SMILES notationA branch cannot immediately follow a double- or triple-bond symbolExample: C=(CC)C is invalid, butC(=CC)C or C(CC)=C are valid SMILES
-
SMILES FragmentsNitroNitrateNitriteSulfonic acidCyanide/NitrileAzideAzido
N(=O)(=O)ON(=O)(=O)ON(=O)S(=O)(=O)OC#NN=N#NN+=N-
-
SMILES Metals[Al] [As] [Au] [Be][Bi] [Cd] [Ca] [Fe][Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb][Sn] [Zn] [Zr]
-
Disconnected StructuresIndicated by a dotTetramethyl ammonium bromide
C[N+]C(C)C.[Br-]
-
Isomeric and Chiral SMILESIsomeric configuration indicated by forward and backward slashes: / \Examples:trans-1,2-dibromoethene: Br/C=C/BrDirection of the slash continuescis-1,2-dibromoethene: Br/C=C\BrDirection of the slash reversesChirality indicated by the @ symbol
-
Some ApplicationsJMDraw/SMILESViewer (Christoph Steinbeck)JME Molecular Editor (Peter Ertl)STN Express (SMILES as output)Tripos (dbtranslate: SMILES to MOL)Marvin (Ferenc Csizmadia)
http://chemaxon.com/marvin/CACTVS http://www2.ccc.uni-erlangen.de/cactvs/
-
Another ApplicationSMILESCAS Database
http://www.syrres.com/esc/smilecas.htmOver 103,000 SMILES notationsInput CAS Registry NumberLeads to SMILES and thence to a structure search