1
InstantJChem: a flexible chemical database system
G. Marcou, D. Horvath+Laboratoire d’infochimie, Université de Strasbourg, 1, rue
Blaise Pascal, 67000 Strasbourg
Introduction The goal is to present InstantJChem for the
storage and manipulation of chemical information
1. General presentation2. Database search3. Creation of a database from scratch
What is a database? A database stores data in an ordered form on a
precise subject. A relational database stores information into
tables which possess inter-references A relational database management system
(RDBMS) is a software that manages relational databases
InstantJChem is not a database and is not an RDBMS.
What is InstantJChem? InstantJChem is a friendly interface between a
RDBMS, chemical information and the user.
User
RDBMS
Chemical Information
Key concepts of InstantJChem
ProjectsSchemaDatabases and TablesEntitiesData TreesViews
Exercise 1Create a new project names IJCExercises…
Key concept: Project
Project
contains resources and connections to one or more databases.
icon
Exercise 1
…and import the file SC100.SDF in it….
Key concept: Schema
Schema/Database
Contains connection to a database and special tables (JChemProperties)
icon
Key concept: Database and Tables
Table
Database and tables are managed by the RDBMS.
Actually store information.
icon
What can be storedType Description
Standard tableInteger Long integer: 232 = 4294967296
Text User can specify widths of text fields as large as needed.
Real Real double-precision
Date Allows to store dates.
Boolean Value is True or False
List (Standard) To store a list of database items
JChem table
Chemical terms A list of functions evaluated on chemical structures: logD, pKa, tautomers,...
Structure Chemical structure, automatically created with a Jchem table
Key concept: Entities
Entity
An entity is a representation of data.
icon
It is a unique interface to conceptually different types of tables (Standard, Chemical, SQL, Extractions, etc).
Key concept: Data Trees
Data Tree
A collection of entities and views.
icon
Organize information using a hierarchy (parent-child relationship between entities).
Exercise 1….Customize a browser for it.
Key concept: Views
Views
An interface to data.
icon
For simple data, a spreadsheet view is relevant. For complex relational data, a form is mandatory.
Exercise 2In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search.
Exercise 2In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search.
Substructure search: 20 hitsSimilarity search: 0 hits
Substructure search: 14 hitsSimilarity search: 0 hits
Similarity search uses Chemical Hashed Fingerprints defined at database creation.
Chemical Hashed Fingerprints (CHF)
• Pattern Length: number of bonds of a pattern
• Fingerprint Length: total number of bits to store the fingerprint
• Bits per pattern: number of bits a pattern shall set on
Efficient annotation to accelerate structure search
www.chemaxon.com
Exercise 3Combine molecule 25 and 89 into a pseudo-molecule to perform a superstructure query.
Exercise 4Use compound 46 as a Full and Full fragment query to search the database. Repeat after removing the bromide from the query.
Structure Searches
www.chemaxon.com
Exercise 5Search benzene containing compounds, which name contains “pyrimidin” and annotated as “Good” concerning their aqueous solubility.
Exercise 6Search for compounds with at least one aromatic ring containing at least on Nitrogen atom
Exercise 7Search for compounds which MolWeight > 200 and not containing a benzene ring
Exercise 8Search for compounds with MolWeigh > 200, then for compounds without a benzene ring and search for the union of the hit lists.
Execrise 9Search for compounds possessing more than 4 microspecies at pH=4.0….
Exercise 9… Export your hit list.
Exercise 10Import in your project the file ISICCRsm.RDF…
Exercise 10… Create a Browser for this database
Exercise 11Search for reactions including an imidazole ring into their reactants then into their products.
Exercise 12Add to your Schema a new data tree and structure entity named AlkanBoilingPoint…
Exercise 12… and add a floating point value field named BoilingPoint.
Exercise 13Add to the AlkanBoilingPoint entity the following data.
Exercise 14Add to the AlkanBoilingPoint entity a new date field named Date and fill it.
Exercise 15Add to the AlkanBoilingPoint entity a calculated value of LogP using a Chemicalterm field.
Summary Create a project and schema Import data Search by substructure, superstructure, similarity,
and exact match Search by keyword Combining queries and result lists Export query results Create a new database
Conclusion InstantJChem is a Chemoinformatics layer above a
standard SGDB. Provides many more Chemoinformatics services
(databases overlap, QSPR modeling, plots, enumeration, scripting)
SGDBSGDB InstantJChemInstantJChem