bchm 313 lecture 7 sy student - peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/bchm... ·...

38
Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates Tuesday, February 15, 2011

Upload: others

Post on 25-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Lecture 7

Structure Verification, Terminology,Protein Data Bank

Dr. Susan YatesTuesday, February 15, 2011

Page 2: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

This Week’s Plan• Today

• Finish up formal part of the lectures

• Wednesday (tomorrow)• *Change*• Review for Midterm exam• Notes for Review Class will be posted AFTER class

• Friday• New developments in the field and other cool stuff

Page 3: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Midterm Exam• Reminder

• Friday, March 4th (in class)• Covers NMR and Crystallography

• Always available by email ([email protected])• Will respond to emails up until March 3rd at 9PM to answer your questions

• Formal drop-in office hours• Which afternoon works best for you?

• Monday, Feb. 28th OR• Tuesday, Mar. 1st OR• Thursday, Mar. 3rd?

Page 4: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Assessing Overall Quality of Structures

• Quality criteria• Resolution• R-factor & R-free• Geometry• B-factors• Other experimental data

• Does the model agree with biochemical and other data (mutagenesis, kinetics, spectroscopy etc.)

• Who checks the crystallographer?• The reviewers• The protein data bank (PDB)• Competing groups working on similar structures

Page 5: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Understanding and Evaluating Crystallographic Data

Page 6: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Crystallographic Resolution• Can we resolve planes of atoms that are “X” Å apart?

Page 7: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Resolution: Real Space

Page 8: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Resolution in a Real Protein Map

Page 9: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Resolution

Page 10: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Data Quality Determines Structural Detail and Accuracy

Page 11: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Resolution and Reciprocal Space

Page 12: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Resolution

Page 13: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

The Crystallographic R-factor• Refinement is a mathematical procedure which iteratively improves the fit between the experimental diffraction data (Fobs) and the theoretical diffraction data which can be calculated from the structural model (Fcalc) at any given stage

R = 100 x Σhkl(|Fobs|-|Fcalc|) / Σhkl(|Fobs|)

• The constant feedback between the experimental data (Fobs) and the model (Fcalc) is one of the most important strengths of crystallography

Page 14: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

The Crystallographic R-factor

R = 100 x Σhkl(|Fobs|-|Fcalc|) / Σhkl(|Fobs|)

• The process of ‘bootstrapping’ to obtain phase information requires the assignment of the atom positions in the electron density map• With large phase error, this process can be difficult and error prone

• As a reliability check, the model structure is used to calculate structure factors (Fhkl), which are then compared to the observed structure factors

• R-factor and R-free are the best indicators that a model reflects actual experimental data

Page 15: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

The R-factor• One of the key statistics for judging a structure’s quality• Does the model reflect the actual experimental data?

• The residual (or fraction) of the data that the model does not explain

• Low resolution structures can be as high as 30%• For exceptional sub-atomic resolution structures as low as 10%

• R-factor usually around 20-25%

Page 16: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

The R-factor• For a typical small molecule

• R-factor in the range of 3% -7%

• For a typical protein • R-factor in the range of 20% -25%

• The fundamental reason for the difference is the crystal quality (purity of the sample and conformational flexibility of the molecule) and accuracy of the model (phasing quality and resolution)

Page 17: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

R-factor

Page 18: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

R-free• Calculated the same way as R-factor but only looks at a fraction of the data that has never been used to the refine the structure• 5-10% of reflections removed randomly from the data set prior to refinement• Reflections for entire dataset called work set or used

• Reflections for removed reflections called test or free

Page 19: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

R-free• Unbiased measure of the success of structural refinement• The refined model has never seen the omitted data so the comparisons report an unbiased evaluation of the accuracy of the model

• Indicator of incorrect modelling when >> R-factor• For good models usually no more than 5% higher than R-factor

• Gives a more objective measure of the quality of the model• Not biased towards these reflections• Avoids model bias and overfitting of the data

Page 20: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Geometry• Model must have reasonable bond lengths, bond angles and overall geometric agreement compared to other well-defined structures

• Deviations for bond length <0.009 Å with angle deviations <2° compared to ideal values

Page 21: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Ramachandran Plot• Define whether or not the main chain dihedral angles fall into spatially allowed conformational regions

ψ

phi

psi

Page 22: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Ramachandran Plot

Page 23: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Temperature Factors or B-Factor• If we could hold an atom rigidly fixed in one place, we could observe its distribution of electrons in an ideal situation• Image would be dense towards the centre with the density falling off further from the nucleus

But that’s not real life �• Electrons usually have a wider distribution

• Due to vibration of the atoms, and/or differences between the many different molecules in the crystal lattice

• The observed electron density will include an average of these small motions • Slightly smeared image of the molecule

Page 24: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Temperature Factors or B-Factor• Describes the degree to which the electron density is spread out for each atom• The amount of ‘smearing’ is proportional to the magnitude of the B-factor

• An indicator of thermal vibration of atoms

• Indicates the true static or dynamic mobility of an atom, and also errors in model building • How the electron density of an atom is broadened by disorder in the crystal• Local static disorder - Atom positions change from one unit cell to another

• Local dynamic disorder - Atom positions change over time during the measurement

Page 25: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

B-factors and Disorder• B-factors are introduced to account for disorder in the atomic model

• Confidence measure for location of each atom

• On scale from 1-100 Å2

• If an atom on the surface of a protein has a high temperature factor• Atom is probably moving a lot and you are only observing one possible snapshot of its location

Page 26: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

B-factors• Values <10 create a model of the atom that is very sharp

• Atom is not moving much and is in the same position in all the molecules of the crystal

• Values >50 indicate that the atom is moving so much that it can barely be seen

This His coordinates with the iron atom so it is held firmly in place

B-factor= 15-20

This His is exposed on the surfaceB-factor= 32-74

Electron density is weaker enclosing a smaller space

Page 27: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

B-factors• Atoms coloured by temperature factors• High values (lots of motion) in red and yellow

• Low values in blue

• The protein interior (core) has low B factors but the surface residues have higher values

Page 28: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Disorder• Disorder makes interpretation more difficult

• The intensity of the diffraction data decreases with resolution

• The higher the average B-factor, the faster the drop-off

• This makes it more difficult (impossible to collect high resolution data)

Page 29: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Protein Structure from X-ray Diffraction

Page 30: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

What is the ? http://www.pdb.org

http://www.rcsb.org/pdb/

“An Information Portal to

Biological Macromolecular Structures”

• The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies

• Structural biologists determine the location of each atom relative to each other in the molecule then deposit this information, which is then annotated and publicly released into the archive

Page 31: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Constantly Growing…• Reflection of research happening around the world• Structures available for many of the proteins, nucleic acids involved in the central processes of life• You can find structures for ribosomes, oncogenes, drug targets, whole viruses etc. etc.

• Exciting yet sometimes challenging to find the information you want• Just SO MANY STRUCTURES

• 71,138 structures in the database as of Feb 8th, 2011

• You may find multiple structures for a given molecule, partial structures, structures that have been modified or inactivated from their native form

Page 32: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

The • Each entry in the PDB given a unique identification code• 1ATP, 1TOX, 3LCB

• PDB files • Header, summary of the protein, citation information, details of structure solution, sequence

• List the atoms in each protein (and solvent, water, ligands), and their 3D location in space (coordinates)

• Typically contains coordinates of just one asymmetric unit which may or may not be the same as the biological assembly

• PDB offers tools for browsing, searching and analyzing structural data

Page 33: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Reading Coordinate Files

Column designation

1. ATOM or HETATM2. Atom #3. Atom name4. Type of amino acid 5. Chain ID6. Residue #7. Coordinates (xyz)8. Occupancy9. Temperature factor (Å2)10. Atom type

1 2 3 4 5 6 7 8 9 10

x y z

Page 34: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Application of Coordinates• Structure represented by orthogonal coordinates (in Å)

• Generate 3-D diagrams (ribbon diagram, all-atom close view of a small section, surface plot etc.)

• Geometric calculations such as bond distance and angle

• Theoretical studies such docking and modelling

x1,y1,z1

x2,y2,z2

Distance = square root of [(z2-z1)2 + (y2-y1)2 + (x2-x1)2]

Page 35: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Asymmetric Unit• Smallest portion of a crystal structure to which symmetry operations can be applied in order to generate the complete unit cell• Common symmetry operations are rotations, translations and screw axes (combinations of rotation and translation)

Two-fold symmetry axis

Page 36: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

X-ray Structures - Limitations• Need lots of highly pure protein (~10 mg), and sometimes special labels (e.g. selenomethionine), so may be limited to using recombinant proteins

• Sometimes it is challenging to find a condition where the protein crystallizes

• Proteins with floppy loops or moving domains can be problematic• Might not be able to crystallize these

• X-ray structures are static – no information about dynamics

• Hydrogen atoms scatter poorly and are only visible at very high resolution

Page 37: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

X-ray Structures - Advantages• Protein crystals are typically half water so the protein’s environment is actually pretty physiological

• Structure also shows more than just protein (H2Os, metals, ions, ligands etc.)

• At atomic resolution (<1.0 Å) bond lengths can be measured directly instead of assumed, and deviations from canonical geometry can be seen

• No lower limit on protein size as long as it is well folded

• No upper limit on molecule size• Intact viruses and ribosome have been solved (5 MDa)

• Many steps can be automated• High resolution structures can be solved in a few days collecting data

Page 38: BCHM 313 Lecture 7 SY student - Peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/BCHM... · Lecture 7 Structure Verification, Terminology, Protein Data Bank Dr. Susan Yates

Next time…• Review!