ebi is an outstation of the european molecular biology laboratory. msdchem and the chemistry of the...
TRANSCRIPT
EBI is an Outstation of the European Molecular Biology Laboratory.
MSDchem and the chemistry of the
wwPDB
EMBO 22nd-26th September 2008EMBL-EBI Hinxton UK
The PDB Chemical components
PDB has more than the folding of standard polymers in 3-D
It gives an insight of interesting special chemistry
Bound ligands
Modified aminoacids
Non-standard chemical components are often the most interesting
The PDB ligand dictionary has served for many years
As the reference dictionary for the chemical definition of 3 letter codes in the PDB data
The ligand dictionary has been maintained by the curators in all wwPDB sites
Problems were accumulated
Duplicate entries
Impossible chemistry
The definition of what a 3 letter code represents was not clear and consistent
Stereo-chemistry was ignored
The MSDchem database
The database that supported the chemical component dictionary in the MSD.
The curation team had an explicit clear definition about ligands, right from the start
A distinct stereo-isomer;
connectivity,
bond orders,
absolute stereo-descriptors of atoms and bonds
This was reflected in the design and the implementation of the MSDChem database
The ligand identityAtom, elements, bonds and bond ordersAtom and bond absolute stereo-descriptors (Cahn-Ingold-Prelog)Equivalent to a canonical stereo-smile or INCHI string
MSDchem ligand definition
DCF
C4' R
C3' S
C1' R
DCM
C4' S
C3' R
C1' S
Other propertiesAtom names, and atom/bond orderingRepresentative coordinates
Derived propertiesAromatic bondsSmiles – INCHI stringsSystematic namesIdealised coordinatesRings – planesAtom Energy types
For known ligands coordinates are checked with ligand definition (Program DOHLC)
Atom labeling is checkedA new ligand may have to be defined
For a new ligandFundamental properties are checked Derived properties are generatedIs it identical to an existing ligand with another code? (DOHLC)
Ligand curation
3TH
Not possible
New ligand Actually it is6CP
Improvement of the chemical dictionary A core task of the wwPDB remediation projectRemaining issues and data errors were fixed
Duplicate identical ligandsNo representative coordinatesWrong valences
The definition of the ligand identity and the deviations were agreed among wwPDBThe wwPDB invested significantly in this area with a new software toolkit (ChemComp)Replaced most of the MSDChem backend
Ligands in the wwPDB
Additional investment in chemical softwareUse of chemical software packages
CACTVSOpenEyesCORINALexiChem
MSDChem not a separate data resourceJust loading of the wwPDB ligand dictionary in OracleIUPAC atom names,deoxy-bases, better chemical names
Molecules too big to be a single chemical componentSpecial chemistry (like metal complexes)Limitations of chemical softwareLegacy chemical components that is hard to deal with (like ions) Components that have never been fully observedModified components
Difficult Issues
Public pages for the wwPDB ligand dictionaryBased on an Oracle database load
Various search optionsVisualisation and navigationExporting in other formats
Has been running for almost 6 yearsIs used and referred by
Ligand Depot (RCSB equivalent)ChEbi at EBIPubChem at NCBIHIC-Up and others
The MSDChem web application
StatisticsNumber of ligands
0
1000
2000
3000
4000
5000
6000
7000
8000
2000 2001 2002 2003 2004 2005 2006 2007
Daily average load of MSDChem ~ 400 queries~ 100 distinct IP adresses
Hits per location
edu
uk
ebi
other
eu
com
net
Most common case: search for a 3 letter code seen in a PDB file
Search for a chemical name or part of it found in the literature
All known names are searchedCommon, PDBSystematicA synonym
Search following references
Ligand details
For every kind of search there is a result list Summary information Preview icon of the molecule
Links to pages for every chemical componentWith detailed imagesLinks for more information about atoms, bond etc.
Various options for 3-D visualizationDownload options for common chemical formats
Searching for chemical composition
Often aspects of composition are known but not the exact structure
Like particular elements (metals etc.)Or particular chemical fragments
User friendly expression building pages based on formula or fragments
Visually browse through the results
Formula range
Expression can be built with web form
Example : O1-4 N3-100 F01 to 4 oxygensMore than 3 nitrogensNo FluorineAnything else
Fragment search
Web form
Significant fragments
Example : More than 2 benzimidazolesNo piperazineAnything else
Searching for parts of structure
An outline of the structure or of some characteristic part is known
Looking for variants of moleculesLoad the known target and remove the unimportant partsPerform an sub graph search
Looking for chemical components with similar fragments and localized chemistry
Load the known target and perform a fingerprint search
Substructure search
Applet to draw diagram
Load and modify existing ligand
May take a couple of minutes
Links to the PDB
MSDchem searches strictly the reference dictionaryBut provides links to the PDB entries that include a ligand or a set of ligands
From ligand details pagesAnd from any query results page
Links to the summary pages for the entries (MSD Atlas pages)Or instances of the ligands in entries along with their environment and interactions (MSDmotif)
Link to Binding sites
Details - interactions of these ligands in entriesStatistics – search within results
Ligand index – download
Download of the complete archive
Compressed tar of Molfiles (SDF) CML (ChEBI style)MSDChem XMLRelational database
Just listingsSmile strings – name