biomagresbank (bmrb) data deposition and entry annotation ... · t2 relaxation heteronuclear noe...
TRANSCRIPT
BioMagResBank (BMRB) Data Deposition and Entry Annotation Requirements
Eldon L. Ulrich
Presentation topics
New ADIT-NMR deposition system – Steve Mading and DimitriMaziuk in collaboration with the Rutgers RCSB team (Monica Sundd, Monica Sekharan, Zukang Feng, John Westbrook, Helen Berman, and Jasmine Young)
Restraints processing – Jurgen Doreleijers and Jundong Linin collaboration with the EBI/CCPN (Wim Vranken), UtrechtUniversity (Aart Nederveen, Alexandre Bonvin, and RobertKaptein, and Radboud University (Gert Vriend and Chris Spronk)
Assigned chemical shift validation systems – David Tolmie, KentWenger, and Dimitri Maziuk in collaboration with NESG (Hunter Moseley) and NMRFAM (Gabriel Cornilescu, Hamid Eghlbania, and Liya Wang)
Future issues
BMRB mission and goals
Mission: Gather and distribute in the public domain as much biological NMR data as possible to further research and education
Goals: 1) Create efficient data deposition systemsRequires minimal user effortComplete – follows IUPAC recommendationsFree of obvious errorsPromotes uniformity
2) Through annotation improve the usefulness of the dataIdentify anomalous data and communicate issues
with depositorsMigrate data into standard formatsCarry out value added processes
3) Develop data query systemsEntry retrievalLongitudinal database searches
BMRB NMR-STAR Content
Entry information
Contact persons
Molecular system
Molecules
Natural source
Experimental source
Spectrometer description
Samples
NMR experiments
Sample conditions Software
Applied experiments
Citations
Chemical components
Experimentally derived data
Molecular descriptionExperimental details
NMR spectral dataChemical shift assignmentsChemical shift referenceTheoretical chemical shiftsChemical shift isotope effectsChemical shift anisotropyCoupling constantsResidual dipolar couplingsT1 relaxationT1rho relaxationT2 relaxationHeteronuclear NOEHomonuclear NOE/ROEDipole-dipole relaxationCross correlationSpectral density valuesSpectral peak listsTime-domain data sets
Experimentally derived data
Three-dimensional structures
NMR constraintsConstraint statisticsCoordinates for structure modelsRepresentative model coordinatesStructure quality parameters
Kinetic parametersH-exchange ratesH-exchange protection factorsOrder parameters
(isotropic and anisotropic)
pKa valuesD/H-fractionation factors
Thermodynamic parameters
Secondary structure features
Helix/sheet/turn/loopDeduced H-bondsAuthor interpretation
One-stop RCSB BMRB/PDB ADIT-NMR deposition system(URL:deposit.bmrb.wisc.edu/bmrb-adit/)
BMRB and RCSB-PDB depositions can be generated from a joint interfaceBMRB interface has been streamlinedRCSB-PDB interface has been extended with optional fields for conformer and constraint statisticsFiles in PDB format, mmCIF, and NMR-STAR can be uploaded to pre-populate a deposition
Many fields (i.e., experiment name, software name, software author, etc.) have pull-down lists to choose from for convenience and to improve uniformity (controlled vocabulary)Fields common to multiple forms are linked to eliminate the needto retype information (i.e., uploaded data file names, author names, molecule names and others)Help and examples have been improvedYou can start with either BMRB or PDB and switch between the two as you go along
ADIT-NMR architecture
MAXIT
PDBPDBxmmCIF
PDBx
s2nmr
NMR-STAR v2.1NMR-STAR v3
NMR-STAR v3
nmrstr2nmrifnmrif2nmrstar
NMR-IF
ADIT-NMR
nmrif2pdbxpdbx2nmrif
PDBdeposition
PDB BMRB
v3
v2.1
BMRBdeposition
coordinatesrestraints
+
experimental data+
Precheck and validation of coordinate files
Precheck/Validate:Performs thesame checks
as are availablevia PDB's ADIT
tool.
Uses theCoordinate filegiven above.
ADIT-NMR validation report
ADIT-NMR constraint statistics
0
100
200
300
400
500
600
700
800
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
Dep
ositi
ons
Year
BMRB depositions by year (~260 PDB depositions received)
0
5
10
15
20
25
30
35
40
32
20 0 0 1 1 0
5
BMRB first, PDB second
[0..20min)[20min..1hour)[1..2) hours[2..4) hours[4..8) hours[8..16) hours[16..32) hours[32..64) hours>= 64 hours
Number of depositions
ADIT-NMR dual session latencyFor sessions that are deposited to both PDB and BMRB, how closetogether do the two depositions occur in time?
After depositing to either of the two databanks, much of the data for depositing tothe other databank is also complete, hence the short times seen here. In fact, the2 hour bar can be broken down further, and it turns out that most are less than 15minutes.
0
5
10
15
20
25
30
35
40
19
3
02
1 0
31
9
Depositions to BMRB
[0..20min)[20min..1hour)[1..2) hours[2..4) hours[4..8) hours[8..16) hours[16..32) hours[32..64) hours>= 64 hours
Number of depositions
BMRB first, PDB second PDB first, BMRB second
ADIT-NMR development
Phase I – completed/retiredBMRB only depositions
Phase II – completed/being refinedBMRB-PDB combined depositions
Phase III – being designedAccess to NMR atomic coordinate and restraint
validation toolsAccess to assigned chemical shift validation toolsImproved data import functions (PDB Extract,
CCPN data harvesting tools, others)
NMR Restraints processing
FRED
Synchronize modelsCorrect atom nomenclatureAdd missing
hydrogen atoms(Wattos)
DOCR
NMRRestraints
Grid
PDBE-MSD
RECOORD
Parse restraints(Wattos)
Link parsed restraints and coordinates
NMR-STAR data(FormatConverter, E-MSD)
Analyze coordinates and all NMR
constraint types
Structure recalculation(Amber/CNS/CYANA/Gromacs/
Yasara etc.)
Convert to structure calculation programs(FormatConverter)
Remove surplus, calculate violations, completeness,
information content(Wattos/Queen)
Refine molecular system
(FormatConverter)
NomenclatureCorrection(WHAT IF)
Development site
BMRBEBIOther
‘Surplus’ distance restraints categories
Exceptional – atoms not present in the PDB entry
Double – duplicate restraints
Impossible – do not match topology provided
Fixed – atoms have fixed distances
Redundant – upper bounds greater than what isallowed by topology
PDB Entry counts by criteria
ConclusionsFraction of good converted MR files highly increased from ~500 to 2,271/3,057 and remaining issues documented Contacted authors since April 2006 for entry failing any of fourcriteria. About 10 % of the entries; mostly larger violations that are ok by author
Processing the entries has been further automated; onlyentries failing criteria get checked manually at this stage
Results from RECOORD database analysisWater refinement improved structure qualityLow correlations were found between various quality
indicators Surprisingly, quality indicators did not correlate well
with the number of distance restraints
Assigned chemical shift checksTALOS/NMRPipe/MolMol implementation
Cornilescu, G., Delaglio, F., and Bax, A., J. Biomol. NMR 13, 289 (2001)
Delaglio, F., unpublishedKoradi, R., Billeter, M., and Wüthrich, K., J. Mol. Graph 4,
51 (1996)AVS (Assignment validation software)
Moseley, H.N.B., Sahota, G. and Montelione, G.T., J. Biomol. NMR 28, 341 (2004)
LACS (Linear analysis of carbon-13 chemical shift)Wang, L., Eghbalnia, H. R., Bahrami, A., and Markley,
J.L. J. Biomol. NMR 32, 13-22 (2005)Shifts (planned)
Xu, X.P. and Case, D.A., J. Biomol. NMR 21, 321 (2001)SHIFTX/SHIFTCOR (planned)
Neal S., Nip, A.M., Zhang, H., and Wishart, D.S., J. Biomol. NMR 26, 215 (2003)
AVS chemical shift validation reportA no m alous C he m ical Shift A ssign m ents:
T he assigned che m ical shifts in the follo wing table have been reported as ano m alous, suspicious,or duplicate (A, S or D respectively, in the Error M sg. colu m n) by the softw are e m ployed byB M R B to check for che m ical shift outliers. Please verify these assign m ents by replacing thequestion m arks in the 'C ode' colu m n of the table with the appropriate code. T he codes to use are:V = verified, D = delete, and R = replace. W here R is indicated, please supply the revisedche m ical shift value in the Replace C.S. colu m n of the table. If there are a large nu m ber ofrevised che m ical shifts, it m ay be m ore convenient to edit the full N M R-S T A R file. Pleaseinfor m the annotator in charge of the entry of your m odifications.
A uthor V erificationM ol Res. Res. Ato m O bs. Error E xpected Std. C ode ReplaceID # T ype D elta M sg. D elta D ev. C.S.--------------------------------------------------------------------------------------------------------1 17 G L U N 108.138 A 120.68 3.68 ? ?1 21 L E U H D 2 -0.446 S 0.76 0.28 ? ?1 30 L Y S H 10.556 S 8.22 0.64 ? ?1 34 IL E H D 1 -0.297 S 0.7 0.3 ? ?1 53 T H R C B 62.879 S 69.64 1.7 ? ?
Protein delta chemical shift values (observed – calculated)
Histogram of delta chemical shifts (observed – calculated) for protein helix and sheet residues
LACS visualization
International structure genomics task force committees
• Structural genomics informatics task force
• Task force on numerical criteria for evaluating and assuring structure quality
• Task force on tracking and registration of targets
• Task force on deposition, archiving, and curation of the primary information
• Task force on mechanisms for publication and recording of methods
• Task force on intellectual property rights
Structural genomics NMR-STAR dictionary development collaborations
Cheryl Arrowsmith
Michael Kennedy
John Markley
Guy Montelione
James Prestegard
David Wemmer
NMR dictionary general discussion topicsAlignment with mmCIF - Yes
Use identical tags and definitions whenever possible
Human readable export data format – Yes
Reproduction of the experiment and data derivation - NoInput data, derived data, description of the tools,
protocol files, and parameter files used in the derivationExplicit links between the input data items and individual
derived data items, including the possibility of capturing intermediate results in the derivation process
Application specific data items - NoCapture in the protocol and parameter files
Software must be in place to meet higher deposition requirements - Yes
BMRB Time-domain data summary(www.bmrb.wisc.edu/data_library/timedomain/)
Entries: released 66
NMR experiments:Total >600Unique 75-100Reduced dimensionality 5 (entries)
(entries 5596, 5844, 5859, 7170, 7191)
Pulse sequences and processing parameters are often provided
Other information:Peak lists 15 (entries)Structure calculation (with all intermediate results) 3
(entries 6128, 6176, 6318)
Future issues
Probabilistic approaches to structure determination
Structure determination from chemical shift data
Use of multiple kinds of NMR data in structure calculations
Structure refinement with data from non-NMR techniques
BMRB MadisonJohn L. MarkleyJurgen F. DoreleijersJundong LinSteve MadingDimitri MaziukDavid TolmieHongyang YaoChristopher Schulte
Computer sciences collaboratorsYannis Ioannidis Miron LivnyZachary MillerR. Kent Wenger
RCSBHelen BermanZukang FengMonica SekharanMonica SunddJohn WestbrookJasmine Young
Rutgers UniversityMike BaranDehua HangGuy MontelioneHunter Moseley
AcknowledgementsBMRB Advisory Board
CCPN/EBI Wayne BoucherRasmus FoghJohn IonidesErnest LaueTim StevensWim Vranken
NMRFAMArash BahramiGabriel CornilescuHamid EghbalniaLiya WangWilliam M. Westler
Utrecht UniversityAlexandre BonvinAart NederveenRobert Kaptein
Members of the NMR Community Structural genomics groupsCatherine BougaultBruce JohnsonDavid WishartZsolt Zolnai
Funding
BMRB OsakaHideo AkutsuEiichi NakataniYoko Harano
BMRB FlorenceAntonio Rosato