![Page 1: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/1.jpg)
Chemical Databases, Identifiers, Tool Kits and Web
ServicesOctober 16, 2003
Marc C. Nicklaus, CADD Group, Lab. of Medicinal Chemistry, CCR, NCI, NIH; [email protected]
Thanks also to the other members of the CADD Group:
Rajeshri G. Karki, Megan L. Peach, Karen M. Green, Guangyu Sun, Igor Filippov
![Page 2: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/2.jpg)
Acknowledgements
• Wolf-Dietrich Ihlenfeldt (formerly Computer Chemistry Center [CCC], Erlangen, Germany)
• Frank Oellien (formerly CCC)
• Bruno Bienfait (formerly CCC and LMC)
• Johannes Voigt (formerly LMC)
![Page 3: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/3.jpg)
Reasons to Deal with (Large) Chemical Databases (of Small
Molecules)• Inventory• Source for drug design, (virtual) screening• Repository for associated information (assay
results, physicochemical data, environmentally important properties etc.)
• Chemoinformatics• Link to other services, databases• Source for individual structures (e.g. for comp.
chem.)
![Page 4: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/4.jpg)
Questions
• What databases are out there?• What databases can we share?• What tools can we share?• What can we offer the public?• [How] Can we standardize?• How can we determine: have we, has anyone, see this
structure before?• What’s next (XML/CML… )?
![Page 5: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/5.jpg)
The NCI Database &
Enhanced NCI Database Browser
![Page 6: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/6.jpg)
The NCI Database
• Approximately half a million compounds• Collected since 1955 by the National Cancer Institute, NIH• Tested in anti-cancer screens; since 80’s also in AIDS screens• Managed by NCI’s Developmental Therapeutics Program
(DTP); see http://dtp.nci.nih.gov• Publicly available: currently 260,071 cpds. (“Open NCI
Database”)• Cancer screening data (60 cell lines) available for ca. 43,000
compounds• AIDS screening data available for ca. 44,000 compounds• Samples available from DTP for ~60% of compounds
![Page 7: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/7.jpg)
Enhanced CACTVS Browser of the Open NCI Database
• Web-based interface for searching data from the Open NCI Database by numerous criteria, including 2D and 3D structural searches
• Augmented by many additional data, - derived: e.g. number of rotatable bonds - calculated/predicted: e.g. log P; biological activities - systematically determined: e.g. IUPAC names - cross-evaluated: e.g. commercial availability• Boolean searches possible• Requirements: Just a Web browser, several plug-ins are optional• Based on chemical information toolkit CACTVS (Wolf-Dietrich
Ihlenfeldt, see http://www2.chemie.uni-erlangen.de/software/cactvs/)
![Page 8: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/8.jpg)
How To Get There….
URLs: http://cactus.nci.nih.gov/ncidb2
(U.S. mirror)
http://www2.chemie.uni-erlangen.de/ncidb2
(European mirror)
![Page 9: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/9.jpg)
http://cactus.nci.nih.gov
![Page 10: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/10.jpg)
Enhanced NCI Database Browser: Query Form
![Page 11: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/11.jpg)
Combined Search: PASS Antiangiogenesis Prediction & Name (Fragment) Exclusion
![Page 12: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/12.jpg)
Search Result: Hitlist
![Page 13: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/13.jpg)
Search Result: Image Gallery
![Page 14: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/14.jpg)
Search Result: Detail View (top)
![Page 15: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/15.jpg)
Search Result: Detail View (continued)
![Page 16: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/16.jpg)
![Page 17: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/17.jpg)
Update of Enhanced NCI Database Browser• Most up-to-date DTP data sets
• Completion/curation of data where possible
• Many more additional calculated data
• New search capabilities
• New underlying database format
![Page 18: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/18.jpg)
Update of Enhanced NCI Database Browser
Most up-to-date DTP data sets
• 260,071 compounds (new and old releases of Open NCI Database merged, new entry prevails)
• 42,577: have cancer screen results >300 properties
• 43,905: have AIDS screening results 3
(EC50: 3,143; IC50: 39,352)
• 115,324: have animal model assay data (NEW) ~10 (string)
• 139,735: are plated 1
• 45,224: have name(s) (from DTP records) 0...>20
• 127,361: have CAS RN 1
• 3,576: have experimental log P 1
![Page 19: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/19.jpg)
Update of Enhanced NCI Database Browser
Completion/curation of data
• Add CAS numbers: cross-check with other DBs
• Correct structures and/or names: mostly manually, only occasionally possible, often dangerous
• Calculate averages and SD for cell-line assay data
• (No evaluation/curation planned for animal model assay data)
![Page 20: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/20.jpg)
Download Pagehttp://cactus.nci.nih.gov/ncidb2/download.html
260,071 2D or 3D structures with USMILES, IChI, IUPAC Name, CAS RN and additional “canonical properties” [in beta test stage: http://cactus.nci.nih.gov/ncidb2/download_NEW_09-03.html ]
250,251 2D structures in SDF format250,251 structures in USMILES format249,081 2D structures plus cancer and AIDS data (where present) in SDF format32,557 2D structures with cancer test data as of August 1999, in SDF format42,689 2D structures with AIDS test data as of October 1999, in SDF format23,031 2D structures in SDF format for which both cancer and AIDS data are
available
Updated structure and combined structure+data files under preparation
![Page 21: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/21.jpg)
Future Plans and Wishes…
• Apply/add IChI names• Create Search&Display GUIs for more DBs• Cross-complete databases (CAS RNs etc.)• XML/CML-ize databases• Move toward one virtual chemical database• Link different database spaces together (small
molecules, macromolecules, toxicity data, comp.chem. data, spectral data, patent data…)
![Page 22: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/22.jpg)
Other Databases
![Page 23: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/23.jpg)
Public U.S. Government Chemical Databases
NCI Open (02/03) 260,017
NIST WebBook (04/03) 31,167
NLM ChemIDplus (04/03) 160,590
EPA GCES Database (03/02) 66,347
Combined USGovtPubDB ~515,000
More will hopefully follow from: NIAID, EPA, FDA…
![Page 24: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/24.jpg)
DatabasesU.S. Government Databases
NCI Open (09/03) 260,071 publicNIST MS Library (2002) 147,194NIST WebBook (04/03) 31,167 publicNLM ChemIDplus (04/03) 160,590 publicEPA GCES Database (03/02) 66,347 public
Commercial/Other DatabasesACD (1.1999) 221,668ACX (1999) 137,003Ambinter (08/02) 487,397Asinex Gold (09/03) 202,237Asinex Platinum (09/03) 117,518Beilstein Natural Products (02/02) 124,701ChemBridge (02/02) 100,000ChemDiverse IDC (09/03) 122,684ChemDiverse CombiLab (09/03) 201,139ChemNavigator (08/03) 6,954,906ChemStar (05/03) 72,407Petrenko (PN07) 115,554Ryan (10/02) 294,437Sigma-Aldrich Rare Chem. (08/02) 124,728CSD (5.24, 11/02, organic only) 103,860
![Page 25: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/25.jpg)
Canonical Version
• Hydrogens added
• 3D coordinates calculated
• Some deficiencies fixed (nitro groups…)
• Canonical property fields added
• Original fields (mostly) left untouched
• Not yet done, but possible: completed data, e.g. CAS RN, name etc.
![Page 26: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/26.jpg)
SD Files: NOT Standardized!Fields: ID Chem. Name Rec# CAS RN Formula MDL Name
(first line)Database: Field Names:
NCI Open (02/03) NSC - - - - NSC (id)NIST MS Library (2002) ID - - - - -NIST WebBook (04/03) WEBBOOK.ID NAME (n) <...> CAS.NUMBER FORMULA (name) [, ID: (id)]NLM ChemIDplus (04/03) ID molname - - -EPA GCES Database (03/02) (**came as list of SMILES strings**
ACD (1.1999) NAME (1st rec.) (n) <...> - - (name)ACX (1999) CsNum - <...> (n) CAS Formula “MDL Molfile”Ambinter : File 0 ID - IDNUMBER - - -
File 1 id - IDNUMBER - - -File 2 idnumber - ID - - (id)File 3 Code - ID - - -
Asinex Gold (06/03) IDNUMBER - - - - -Asinex Platinum (06/03) IDNUMBER - - - - -Beilstein Natural Products (02/02) BRN CN ID RN - -ChemBridge (02/02) ID - - - - -ChemDiverse IDC (04/03) IDNUMBER - - - - -ChemDiverse CombiLab (04/03) IDNUMBER - - - - -ChemNavigator (06/03) CNC_ID - - - - (id)ChemStar (05/03) IDNUMBER - - - - -Petrenko (PN07) id - - - - -Ryan (10/02) code name ID - - -Sigma-Aldrich Rare Chem. (08/02) CAT_NO - - - - -World Drug Index (3.1999) [RNEXTREG] MOLNAME(1) - CAS MF (name)
(1) Possibly, but not consistently, repeated in fields <RNEXTREG> and/or <RN>
![Page 27: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/27.jpg)
Canonical FieldsField/Property Name Possible values (or explanation)
Properties dependent on structure:E_UNIQUE_ID to be concatenated from Origin + Version/Date + internal ID; example: NLM_04-03_08359076E_CAS CAS Reg. No. (if available; otherwise 999-99-9)E_NAME Chemical Name (if available)E_STEREO_SPECIFIED no_stereocenter | unknown | partial | full
(for most DBs, only choices 1 & 4 relevant)E_COMPOUND_TYPE unspecified (i.e. typically “normal organic”)
metal_complex (if pre-processed as such).... (maybe others?)
E_THREED_SOURCE is_2D | experimental | CORINA_x.y | 3D_unsuccessfulE_ICHI IChI chemical identifierE_SMILES CACTVS-calculated Unique SMILES code (according to 1989 definition published by Daylight, Inc.)E_FORMULA Molecular formula calculated by CACTVSE_HASHY Non-stereospecific hash codeE_HASHSY Stereospecific hash codeE_TAUTO_HASH Non-stereospecific tautomer-invariant hash codeE_STEREO_TAUTO_HASH Stereospecific tautomer-invariant hash codeE_MAXFRAG_HASHY Non-stereospecific hash code of largest fragmnetE_MAXFRAG_HASHSY Stereospecific hash code of largest fragmnetE_MAXFRAG_HASHTY Non-stereospecific tautomer-invariant hash code of largest fragmantE_MAXFRAG_HASHSTY Stereospecific tautomer-invariant hash code of largest fragmantE_MULTIFRAG single_fragment | multi_fragment
Properties constant for whole database:E_ORIGIN Detailed origin of data file; e.g. “DTP February 2003”E_DB_VERSION Explicit DB version number, if availableE_DB_YEAR Year data file was generatedE_DB_TYPE government public (e.g. NCI Open DB)
government non-public (e.g. NIST WebBook)commercial free (e.g. Sig.-Al. RCL)commercial licensed (e.g. CSD, WDI, ACD...)
E_IS_PUBLIC yes | no | unknownE_SAMPLES_AVAIL yes | on-demand | no | unknown (for whole DB!)E_SOURCE Source of database: Company, Agency etc.; e.g. “NCI”E_CONTEXT Nature, or context, of compound: e.g. environmentally relevant cpd., general chemical, drug…E_SUPPLIER_TYPE broker | manufacturer | repository / N/AE_CACTVS_VERSION x.yyy (version of the CACTVS toolkit used to process the database)
![Page 28: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/28.jpg)
Can’t we all get along with each other?
![Page 29: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/29.jpg)
Identifiers
![Page 30: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/30.jpg)
CACTVS Hash Codes
![Page 31: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/31.jpg)
Overlap Analysis via Hash Codes
• CACTVS hash codes• Tautomer-invariant hash codes (NEW!)• Stereospecific vs. non-stereospecific, tautomer-sensitive vs. tautomer-invariant, entire ensemble vs. max. fragment only, -- all eight combinations calculated and added to SD file.• Index file created for rapid comparison:
![Page 32: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/32.jpg)
Hash Codes
![Page 33: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/33.jpg)
Index File009F2C8EFECD9E45 009F2C8EFECD9E45 000140-95-4 EPA_GCES_03-02_000140-95-4 C3H8N2O3009F2C8EFECD9E45 009F2C8EFECD9E45 025155-29-7 EPA_GCES_03-02_025155-29-7 C3H8N2O3009F2C8EFECD9E45 009F2C8EFECD9E45 140-95-4 NIST_WebBook_04-03_C140954 C3H8N2O3009F2C8EFECD9E45 009F2C8EFECD9E45 25155-29-7 NIST_WebBook_04-03_C25155297 C3H8N2O3009F2C8EFECD9E45 009F2C8EFECD9E45 ChemStar_05-03_CHS_0025004 C3H8N2O3009F2C8EFECD9E45 009F2C8EFECD9E45 NIST_MS-Lib_02_140954 C3H8N2O3
0106B6B3BC4B9B7C 0106B6B3BC4B9B7C AsinexGold_06-03_BAS_0605175 C15H12N2O20106B6B3BC4B9B7C 0106B6B3BC4B9B7C ChemStar_05-03_CHS_0816990 C15H12N2O20106B6B3BC4B9B7C 0106B6B3BC4B9B7C NIST_MS-Lib_02_304480737 C15H12N2O20106B6B3BC4B9B7C 0106B6B3BC4B9B7C PN07_10-02_PN07_025090 C15H12N2O2
056832FA004D9ED0 056832FA004D9ED0 ?? 010118-90-8 EPA_GCES_03-02_010118-90-8 C23H27N3O7056832FA004D9ED0 056832FA004D9ED0 ?? NIST_MS-Lib_02_10118908 C23H27N3O7
SMILES code appended to each line (not shown here)
Very rapid searches, overlap analyses, counts etc. with just Unix pipes.
![Page 34: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/34.jpg)
IChI
![Page 35: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/35.jpg)
The IUPAC Chemical Identifier
Steve Stein, Steve Heller, Dmitrii Tchekhovskoi
National Institute of Standards and Technology
Gaithersburg, MD, USA
U.S. Government Chemical Databases
Frederick, MDJuly 22, 2003
with permission of Steve Stein…
![Page 36: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/36.jpg)
Too Many Identifiers
• Structure diagrams – various conventions– contain ‘too much’ information
• Connection Tables– MolFiles, Smiles, ROSDAL, ..
• Pronounceable names– IUPAC, CAS, trivial
• Index Numbers– EINECS, FEMA, DOT, RTECS, CAS, Beilstein, USP,
RTECS, EEC, RCRA, NCI, UN, USAF
![Page 37: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/37.jpg)
What kind of Identifier is needed?
• Derived from structure by algorithm
• Accepts common drawing conventions
• Exactly one Identifier per structure
• Comprehensive
• Openly available
![Page 38: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/38.jpg)
Requirements
• Different compounds have different identifiers– All distinguishing structural information is included
IChI - 1 IChI - 2=
=
![Page 39: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/39.jpg)
Requirements
• One compound has only one identifier– Include only necessary information
NOO
NOO
N+ OO
NOO
Same IChI
= ==
![Page 40: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/40.jpg)
IChIFirst Version
• Discrete, bonded compounds– Include ‘dot disconnected’ compounds
• Stereochemistry– sp3 - tetrahedral
– Z/E - double bond
• Tautomers
![Page 41: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/41.jpg)
3 Steps to IChI
• ‘Normalize’ Input Structure– Defined input structure required– Remove conventions with chemical rules– Divide into ‘layers’
• ‘Canonicalize’ (label the atoms)– Equivalent atoms get the same label
• ‘Serialize’ the Labeled Structure– A unique series of bytes
![Page 42: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/42.jpg)
Simplifications
• Ignore ‘Electron Density’– Double/Triple/Coordination bonds– Odd-electrons/Charges
• Free Rotation Around Single Bonds
• Separate structure information into ‘layers’
![Page 43: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/43.jpg)
NOO
N+O O
NOO
H
H
H
H
H
![Page 44: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/44.jpg)
Output Format
Example: Benzene
Represent atoms as sequence number in formula
C6H6 = C C C C C C H H H H H H
tags 1 2 3 4 5 6 7 8 9 10 11 12
Basic Layer:
<basic>C6H6 1-2-7 2-3-8 3-4-9 4-5-10 5-6-11 7-12</basic>
![Page 45: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/45.jpg)
C3 C4
O7
CH2
1
O9
CH2
2
CH5 OH
10O
8
NH26Na
+1
<IChI version="0.933Beta"> <structure number="1" id.name="Name" id.value="Monosodium glutamate"> <identifier version="0.933Beta" tautomeric="1"> <formula>C5H8NO4.Na</formula> <connections>6H2-5H(4(9)10)2H2-1H2-3(7)8,(H-,7,8,9,10);</connections> <charge>-1;+1</charge> <stereo> <dbond>;</dbond> <sp3>5?;</sp3> </stereo> </identifier></structure> </IChI>
![Page 46: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/46.jpg)
CH1
CH2
CH3
CH5
CH4
CH1
CH2
CH4
CH5
CH3
Fe2+1
CH1
Fe2+11
CH2
CH-3
CH5
CH4
CH6
CH-7
CH9
CH10
CH8
Disconnected Connected
<IChI version="0.933Beta"> <structure number="1" id.name="Name" id.value="Ferrocene"> <identifier version="0.933Beta" tautomeric="0" disconnected="1"> <formula>C5H5.C5H5.Fe</formula> <connections>1H-2H-4H-5H-3H-1;1H-2H-4H-5H-3H-1;</connections> <charge>-1;-1;+2</charge> <reconnected> <formula>C10H10Fe</formula> <connections>1H-2H-4H-5H-3H(1)11(1,2,4,5)6H-7H(11)9H(11)10H(11)8H(6)11</connections> <charge></charge> </reconnected> </identifier> </structure></IChI>
![Page 47: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/47.jpg)
Future Extensions
• Other Stereo Forms– Non-atom centered– Conformations– Hydrogen Bonding
• Polymers/Macromolecules
• Compound Classes– Markush structures
![Page 48: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/48.jpg)
Tools
![Page 49: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/49.jpg)
Tools
• CACTVS
• Pipeline Pilot
• MDL Software
• Daylight Software
• Oracle
• Others…
![Page 50: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/50.jpg)
Free Tools
• CACTVS Scripts; GUI-based tools; Web services; free for
academic and public use. Unix, Linux.
• GUI Generator Generate Web interface automatically from SD file and data. CACTVS- based. Free for non-profit use. Under development.
![Page 51: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/51.jpg)
http://www2.ccc.uni-erlangen.de/software/cactvs/tools.html
![Page 52: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/52.jpg)
CACTVS Documentation
The most complete documentation for CACTVS as of now resides on the LMC Intranet server. Please contact me ([email protected]) for info.
![Page 53: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/53.jpg)
Web Services
![Page 54: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/54.jpg)
Additional Links and Information
Other Public Chemical Data A list of URLs that point to other public chemical (or chemistry-
related) information. Currently limited to U.S. Government web sites. These sites may contain search capabilities and/or other public datasets:
http://cactus.nci.nih.gov/ncidb2/govt_dbs.html
Chemistry Search Services on the Web A table of searchable small molecule databases available on the web,
listing URLs and an (incomplete) survey of the services' features and capabilities.
Contains (U.S.) Government, academic, and commercial web sites:
http://cactus.nci.nih.gov/ncidb2/chem_www.html
![Page 55: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/55.jpg)
Online SMILES Translatorhttp://cactus.nci.nih.gov/services/translate/
Generate Unique SMILES;Translate Between Formats:SD, PDB, MOL, SMILES
![Page 56: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/56.jpg)
“GIF” Creatorhttp://cactus.nci.nih.gov/services/gifcreator/
Create High-Quality 2D Drawingsof Your Chemical Structures:Many input formats supported:SD, MOL, PBD, SMILES, Sybyl…
![Page 57: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/57.jpg)
NCI Screening Data 3D Minerhttp://cactus.nci.nih.gov/services/3DMiner/
Visualize & Mine 60-Cell Line Cancer Data for 41,000 NCI Compounds
![Page 58: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/58.jpg)
![Page 59: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/59.jpg)
GUI Generator CACTVS-based GUI that takes a structure
file (SD or other format) and associated data files (TXT, Excel, etc.) as input and generates a Web service from it. Its output is a CACTVS database file (.cbase file) and the CACTVS script – generating JavaScript pages – that allows web-based searches and display of results in/from this database.
Will be freely distributed for non-profit use. Since script=executable, also freely adaptable. Currently: alpha version.
![Page 60: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/60.jpg)
![Page 61: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/61.jpg)
![Page 62: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/62.jpg)
![Page 63: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/63.jpg)
![Page 64: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/64.jpg)
![Page 65: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/65.jpg)
![Page 66: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/66.jpg)
![Page 67: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/67.jpg)
Pipeline Pilot
• Scitegic, Inc.• GUI-based Icon Pipeline paradigm• Predefined components (really: scripts); you can
modify, or write your own ones• Drag&drop to form pipelines• Ensemble of pipelines is the program (called
‘Protocol’), can be saved, shared…• Now Win. 2000/XP only; fall ‘03: Linux• Very fast: up to 20,000 cpds/sec• Expensive
![Page 68: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/68.jpg)
![Page 69: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/69.jpg)
3D Pharmacophore Searches• Up to 25 conformers pre-
calculated by the program Catalyst (MSI) are stored for each compound.
• Searches are possible by distance constraints and other query features. Most ISIS features are implemented, such as exclusion spheres, centroids, points on lines….
• There are two ways to define a query: Query file is prepared in an external program, such as Catalyst, ISIS/Draw etc., and submitted in .mol format; or use JME Editor available within the service.
Example:3-Point pharmacophore used in previous study on HIV-1 integrase inhibitor discovery. J.Med.Chem. 1997, 40(6), 920-929.
9.053±0.4Å
8.711±0.4Å
2.548±0.3Å
![Page 70: Chemical Databases, Identifiers, Tool Kits and Web Services October 16, 2003](https://reader036.vdocuments.us/reader036/viewer/2022062323/56815a00550346895dc74c40/html5/thumbnails/70.jpg)
3D Pharmacophore Search -- Result