bchm 313 lecture 7 sy student - peterldaviespldserver1.biochem.queensu.ca/~rlc/steve/313/bchm... ·...

Post on 25-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 7

Structure Verification, Terminology,Protein Data Bank

Dr. Susan YatesTuesday, February 15, 2011

This Week’s Plan• Today

• Finish up formal part of the lectures

• Wednesday (tomorrow)• *Change*• Review for Midterm exam• Notes for Review Class will be posted AFTER class

• Friday• New developments in the field and other cool stuff

Midterm Exam• Reminder

• Friday, March 4th (in class)• Covers NMR and Crystallography

• Always available by email (yates@queensu.ca)• Will respond to emails up until March 3rd at 9PM to answer your questions

• Formal drop-in office hours• Which afternoon works best for you?

• Monday, Feb. 28th OR• Tuesday, Mar. 1st OR• Thursday, Mar. 3rd?

Assessing Overall Quality of Structures

• Quality criteria• Resolution• R-factor & R-free• Geometry• B-factors• Other experimental data

• Does the model agree with biochemical and other data (mutagenesis, kinetics, spectroscopy etc.)

• Who checks the crystallographer?• The reviewers• The protein data bank (PDB)• Competing groups working on similar structures

Understanding and Evaluating Crystallographic Data

Crystallographic Resolution• Can we resolve planes of atoms that are “X” Å apart?

Resolution: Real Space

Resolution in a Real Protein Map

Resolution

Data Quality Determines Structural Detail and Accuracy

Resolution and Reciprocal Space

Resolution

The Crystallographic R-factor• Refinement is a mathematical procedure which iteratively improves the fit between the experimental diffraction data (Fobs) and the theoretical diffraction data which can be calculated from the structural model (Fcalc) at any given stage

R = 100 x Σhkl(|Fobs|-|Fcalc|) / Σhkl(|Fobs|)

• The constant feedback between the experimental data (Fobs) and the model (Fcalc) is one of the most important strengths of crystallography

The Crystallographic R-factor

R = 100 x Σhkl(|Fobs|-|Fcalc|) / Σhkl(|Fobs|)

• The process of ‘bootstrapping’ to obtain phase information requires the assignment of the atom positions in the electron density map• With large phase error, this process can be difficult and error prone

• As a reliability check, the model structure is used to calculate structure factors (Fhkl), which are then compared to the observed structure factors

• R-factor and R-free are the best indicators that a model reflects actual experimental data

The R-factor• One of the key statistics for judging a structure’s quality• Does the model reflect the actual experimental data?

• The residual (or fraction) of the data that the model does not explain

• Low resolution structures can be as high as 30%• For exceptional sub-atomic resolution structures as low as 10%

• R-factor usually around 20-25%

The R-factor• For a typical small molecule

• R-factor in the range of 3% -7%

• For a typical protein • R-factor in the range of 20% -25%

• The fundamental reason for the difference is the crystal quality (purity of the sample and conformational flexibility of the molecule) and accuracy of the model (phasing quality and resolution)

R-factor

R-free• Calculated the same way as R-factor but only looks at a fraction of the data that has never been used to the refine the structure• 5-10% of reflections removed randomly from the data set prior to refinement• Reflections for entire dataset called work set or used

• Reflections for removed reflections called test or free

R-free• Unbiased measure of the success of structural refinement• The refined model has never seen the omitted data so the comparisons report an unbiased evaluation of the accuracy of the model

• Indicator of incorrect modelling when >> R-factor• For good models usually no more than 5% higher than R-factor

• Gives a more objective measure of the quality of the model• Not biased towards these reflections• Avoids model bias and overfitting of the data

Geometry• Model must have reasonable bond lengths, bond angles and overall geometric agreement compared to other well-defined structures

• Deviations for bond length <0.009 Å with angle deviations <2° compared to ideal values

Ramachandran Plot• Define whether or not the main chain dihedral angles fall into spatially allowed conformational regions

ψ

phi

psi

Ramachandran Plot

Temperature Factors or B-Factor• If we could hold an atom rigidly fixed in one place, we could observe its distribution of electrons in an ideal situation• Image would be dense towards the centre with the density falling off further from the nucleus

But that’s not real life �• Electrons usually have a wider distribution

• Due to vibration of the atoms, and/or differences between the many different molecules in the crystal lattice

• The observed electron density will include an average of these small motions • Slightly smeared image of the molecule

Temperature Factors or B-Factor• Describes the degree to which the electron density is spread out for each atom• The amount of ‘smearing’ is proportional to the magnitude of the B-factor

• An indicator of thermal vibration of atoms

• Indicates the true static or dynamic mobility of an atom, and also errors in model building • How the electron density of an atom is broadened by disorder in the crystal• Local static disorder - Atom positions change from one unit cell to another

• Local dynamic disorder - Atom positions change over time during the measurement

B-factors and Disorder• B-factors are introduced to account for disorder in the atomic model

• Confidence measure for location of each atom

• On scale from 1-100 Å2

• If an atom on the surface of a protein has a high temperature factor• Atom is probably moving a lot and you are only observing one possible snapshot of its location

B-factors• Values <10 create a model of the atom that is very sharp

• Atom is not moving much and is in the same position in all the molecules of the crystal

• Values >50 indicate that the atom is moving so much that it can barely be seen

This His coordinates with the iron atom so it is held firmly in place

B-factor= 15-20

This His is exposed on the surfaceB-factor= 32-74

Electron density is weaker enclosing a smaller space

B-factors• Atoms coloured by temperature factors• High values (lots of motion) in red and yellow

• Low values in blue

• The protein interior (core) has low B factors but the surface residues have higher values

Disorder• Disorder makes interpretation more difficult

• The intensity of the diffraction data decreases with resolution

• The higher the average B-factor, the faster the drop-off

• This makes it more difficult (impossible to collect high resolution data)

Protein Structure from X-ray Diffraction

What is the ? http://www.pdb.org

http://www.rcsb.org/pdb/

“An Information Portal to

Biological Macromolecular Structures”

• The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies

• Structural biologists determine the location of each atom relative to each other in the molecule then deposit this information, which is then annotated and publicly released into the archive

Constantly Growing…• Reflection of research happening around the world• Structures available for many of the proteins, nucleic acids involved in the central processes of life• You can find structures for ribosomes, oncogenes, drug targets, whole viruses etc. etc.

• Exciting yet sometimes challenging to find the information you want• Just SO MANY STRUCTURES

• 71,138 structures in the database as of Feb 8th, 2011

• You may find multiple structures for a given molecule, partial structures, structures that have been modified or inactivated from their native form

The • Each entry in the PDB given a unique identification code• 1ATP, 1TOX, 3LCB

• PDB files • Header, summary of the protein, citation information, details of structure solution, sequence

• List the atoms in each protein (and solvent, water, ligands), and their 3D location in space (coordinates)

• Typically contains coordinates of just one asymmetric unit which may or may not be the same as the biological assembly

• PDB offers tools for browsing, searching and analyzing structural data

Reading Coordinate Files

Column designation

1. ATOM or HETATM2. Atom #3. Atom name4. Type of amino acid 5. Chain ID6. Residue #7. Coordinates (xyz)8. Occupancy9. Temperature factor (Å2)10. Atom type

1 2 3 4 5 6 7 8 9 10

x y z

Application of Coordinates• Structure represented by orthogonal coordinates (in Å)

• Generate 3-D diagrams (ribbon diagram, all-atom close view of a small section, surface plot etc.)

• Geometric calculations such as bond distance and angle

• Theoretical studies such docking and modelling

x1,y1,z1

x2,y2,z2

Distance = square root of [(z2-z1)2 + (y2-y1)2 + (x2-x1)2]

Asymmetric Unit• Smallest portion of a crystal structure to which symmetry operations can be applied in order to generate the complete unit cell• Common symmetry operations are rotations, translations and screw axes (combinations of rotation and translation)

Two-fold symmetry axis

X-ray Structures - Limitations• Need lots of highly pure protein (~10 mg), and sometimes special labels (e.g. selenomethionine), so may be limited to using recombinant proteins

• Sometimes it is challenging to find a condition where the protein crystallizes

• Proteins with floppy loops or moving domains can be problematic• Might not be able to crystallize these

• X-ray structures are static – no information about dynamics

• Hydrogen atoms scatter poorly and are only visible at very high resolution

X-ray Structures - Advantages• Protein crystals are typically half water so the protein’s environment is actually pretty physiological

• Structure also shows more than just protein (H2Os, metals, ions, ligands etc.)

• At atomic resolution (<1.0 Å) bond lengths can be measured directly instead of assumed, and deviations from canonical geometry can be seen

• No lower limit on protein size as long as it is well folded

• No upper limit on molecule size• Intact viruses and ribosome have been solved (5 MDa)

• Many steps can be automated• High resolution structures can be solved in a few days collecting data

Next time…• Review!

top related