macromolecular structure database project emsd infra-structure services for europe to develop an...
TRANSCRIPT
Macromolecular Structure Database ProjectEMSD
Infra-structure Services forEurope To develop an autonomous structural
database capability in Europe
http://www.ebi.ac.uk/msd
Temblor
EBI-MSD
Spine
Oxford
Autostruct
York
NMRQual
Utrecht
EMBL WellcomeTrust
CCPN
Cambridge
EHTPX
Daresbury
BBSRC
CCP4
IIMS
EBI-MSD
EU
MRC
Integration
Sanger InstSCOP CATH pfam
harvesting
E-science
Advanced search
CLRC
EU
EU
EU
EU
BBSRC
USA
Data Exchange
BMRBRCSB
Validation
Structural Genomics
Electron Microscopy
Grant & co-ordinator
Grant Funding
Core Funding
Data Exchange
clean biological data
integrated data
a single web access point
query interfaces for different users
interconnected views of the data relating structure, sequence, text & experimental details
E-MSD Provides
SwissProt
Medline
Active Sites
Ligands
Folds- Scop/Dali
Secondary Struct
PDB
Ligand
Active site
Structure
Sequence
Keyword
Search Query
Sorted
Hit List
Atlas
page
Structure
Sequence
Active Site
Expt data
Query Results and
Interactive viewer
Web Interface
For Biologist, Chemist, Structural Biologist, Teacher
SSMFastA
Methods
Web services
Data API’s
Methods - as web servicesSwissProt
Medline
Active Sites
Ligands
Folds- Scop/Dali
Secondary Struct
PDB
SSMFastA
Methods
Web based pages
Search interfaces
Interactive Visualisation
DATA INTEGRATIONDATA INTEGRATION
A Database for all ?
MSD SEARCH DATABASE
Data integration
We want to include all types of biological data
Structure, Sequence, Textual Observed biochemistry (Brenda) Sequence annotation (Prints) DNA - ORFS, SNIPS
But we can’t do everything ! So can the Grid allow the integration
of data from other sources ? SwissProt
Medline
Active Sites
Ligands
Folds- Scop/Dali
Secondary Struct
PDB
Problems for Grid (1- Provenance)
We are a funded institute. We have to be seen to be useful or we do not get funded !
Industry need to be seen - share holders
Origin of the Distributed information: User and funding body need to see who provided the
information. How do we retain and present detail of this ?
Problem for Grid (2)
We do not know “best practice” in much of biology Methods : structure alignment, secondary structure… Data : multiple coordinates, multiple sequence data….
There will be conflict of information Data/methods have associated validity information - the
different data/methods may be only inconsistent in part. How is conflicting information going to be presented to
and filtered for a user Who is going to assign data validity !
Grid problem (3- Data access control)
Bioinformatics is fashionable at the moment. There is a “problem” when something is perceived to be useful eg : There are about 60,000 patents in the US for the ~30,000
human genes - not a problem yet, but….. This is more than data security :
Will Grid employ some good lawyers ? Will Grid hide information on request - cf PDB has “hold” status Will Grid “modify” information on request - cf. Google search
result order as been “updated”
Summary
We want to be able to provide a scientific service Web pages and Web services
We would like to be able to expand the results to include information from other data resources.
The 3 issues are only a small number of issues, but represent fundamental problems
CLEAN DATA : Quaternary structure
Chains
ResiduesAtomsXray Experiment
Assembly Sub-Assembly
Biology
CLEAN DATA :Example of experimental result
Authors wouldknow structure,
we have to derive itat submission
M.BOCHTLER et al, NATURE, 403, 800 (2000)
Asymmetric unit
Contains 3 separate molecules - 2 copies of a dodecamer and 1 hexamer
Hexamer Dodecamer
http://pqs.ebi.ac.uk
Assembly
RESOLUTION
SLIDING SCALE FOR RULES
electron density at different resolutions - phenylalanine
Correctly placed into the 1.2 Å data.
This still can be done with confidence in the 2 Å case.
But at 3 Å we already observe a deviation of the centroid of the ring from the correct model
Clean data
1qi3
1rmg
Zscore=(Fit-<Fit>)/sigmaA large positive spike is indicative of a residue which is worse
than the average for that residue type in structures of
similar resolutions.
1f83
Good
Terrible
PHENYLALANINE
Geometric outliers
Loader
LIGAND DB
Site environment DB
Covalent Bonds Coordinate bonds Hydrogen bonds Planes Non-bonding Electrostatics Di-Sulphide bonds
PHE
PHE
O
N
S
ASP
VAL