protein crystallography - gwdgshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · in protein...
TRANSCRIPT
![Page 1: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/1.jpg)
Protein CrystallographyPart III
Tim GrüneDept. of Structural Chemistry
Prof. G. SheldrickUniversity of Göttingen
http://shelx.uni-ac.gwdg.de
![Page 2: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/2.jpg)
Overview
• The PDB file
• Model Building
• Refinement
• Restraints and Constraints
• Model Refinement
Molecular Biology 1 Protein Crystallography III
![Page 3: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/3.jpg)
From Map to Model
An initial electron density (and also a final one) looks quite messy and is difficult to interpret. The finalcoordinate model contains more useful information. It is the target of model building and refinement.
Molecular Biology 2 Protein Crystallography III
![Page 4: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/4.jpg)
Storing Structural Data — the PDB–file
The protein models that are stored e.g. in the Protein Data Bank, PDB, http://www.pdb.org, do not rep-resent the mere experimental data. From the experiment we get diffraction intensities and — after somework — the electron density ρ within the unit cell. The model is the best match (from the author’s point ofview) that explains the experimental data.
A typical PDB-file contains a header with supplemental information (authors, compound, publication, etc.),the crystallographic space group and unit cell dimensions, and a list of atoms. An atom entry containsatom type, atom name, residue type it belongs to, and coordinates, occupancy, and B-factor.
HEADER LIGASE 28-APR-99 1CLITITLE X-RAY CRYSTAL STRUCTURE OF AMINOIMIDAZOLE RIBONUCLEOTIDETITLE 2 SYNTHETASE (PURM), FROM THE E. COLI PURINE BIOSYNTHETICTITLE 3 PATHWAY, AT 2.5 A RESOLUTIONAUTHOR C.LI,T.J.KAPPOCK,J.STUBBE,T.M.WEAVER,S.E.EALICK
...CRYST1 71.170 211.680 94.450 90.00 90.00 90.00 P 21 21 21 16
...ATOM 1 N THR A 5 15.163 80.897 61.279 1.00 20.99 NATOM 2 CA THR A 5 15.093 82.326 61.723 1.00 22.09 CATOM 3 C THR A 5 16.450 83.017 61.598 1.00 21.68 C
...
Molecular Biology 3 Protein Crystallography III
![Page 5: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/5.jpg)
Data Visualisation
Cα trace(smooth) ball–and–stick CPK (space filling)
Cα trace (coloured by B-factor) ball-and-stick (coloured by B-factor)
ribbons
Molecular Biology 4 Protein Crystallography III
![Page 6: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/6.jpg)
Occupancy and B-factor of an Atom
A typical crystal consist of a large number (> 1013) of unit cells, and the resulting model is therefore onlyan average of all these cells. Some atoms, especially those of large side chains (Arginine, Phenylalanine,. . . ) can be partially disordered, others can have several but fixed orientations. An occupancy lower than1 indicates that an atom occupies this position in only a fraction of all unit cells.
Even though data are most often collected at 100 K, atoms are not immobile but vibrate — thermal motion.The temperature– or B– factor describes the vibration as a sphere within which the atom oscillates. Forhigh resolution, the B-factor splits up into a 3x3–matrix that describes anisotropic thermal motion in threedimensions.
Molecular Biology 5 Protein Crystallography III
![Page 7: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/7.jpg)
Multiple Conformation
Molecular Biology 6 Protein Crystallography III
![Page 8: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/8.jpg)
Model Building and Refinement
Creating a model from X-ray data is an iterative process consisting of model building and refinement.
Refinement means global improvement of the model with respect to the experimental data. Coordinatesof all atoms together with their temperature factors (and sometimes, at very high resolution, even theoccupancy), are moved in order to minimise the difference between the measured intensities and theones calculated from the model.
Model Building means local improvement of the model with respect to the experimental data. Atoms areadded, removed, or moved in order to ensure
1. the model makes sense bio–chemically (proximity of atoms, H-bonding, position of solvent molecules,etc.)
2. the model fits the calculated electron density (e.g. check for multiple conformations)
Molecular Biology 7 Protein Crystallography III
![Page 9: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/9.jpg)
Data to Parameter Ratio
No measurement can be exact and is only an approximation to the true value. It is therefore important tohave enough data to support the deduced model.
In protein crystallography we want to determine at least the coordinates for every atom of the structure. Ifmore data are available, we add the isotropic B-value, and at best we can even determine an anisotropicB-value. Our data is determined by the resolution, solvent content, and the unit cell dimensions.
Res.[Å] parameters data/parameters3.0 x,y,z 0.9:12.3 x,y,z; B 1.5:11.8 x,y,z; B 3.1:11.5 x,y,z; B 5.4:11.5 x,y,z; U11U12U13U23U22U33 2.4:11.1 x,y,z; U11U12U13U23U22U33 6.1:10.8 x,y,z; U11U12U13U23U22U33 16:1
G. Sheldrick
These ratios, up to about 1.8Å, would be much too low to allow building of a proper model. The effectivenumber of data is increased by the incorporation of additional — (bio–) chemical etc. — information.
Molecular Biology 8 Protein Crystallography III
![Page 10: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/10.jpg)
Fitting of Data
Parameters used to be and in some occasions still are fitted to the data by least-squares-fit.
The line (parameters are slope and y-intercept) is to be fitted tothe (data) points. The least-squares-fit yields the line with thesmallest total distance to the data points.
More data do not necessarily give a different line, but they re-duce the error of the line, i.e. increases the confidence withwhich we can trust our result
That is why the data to parameter ratio is an important figure to indicate the quality of a model. Refinementand building strategies differ depending on that ratio.
Molecular Biology 9 Protein Crystallography III
![Page 11: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/11.jpg)
Local Minima and Traps
Refinement can only find the next minimum of its target function.
Depending on the starting point (red crosses), this might result in a good or a bad model.
Molecular Biology 10 Protein Crystallography III
![Page 12: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/12.jpg)
Refinement — the R–value
Refinement programs target at minimisation of the R–value, which describes the agreement betweenmeasured amplitudes (
∣∣∣F obs(hkl)∣∣∣) and those calculated from the model (
∣∣∣F calc(hkl)∣∣∣).
R =
∑hkl ||Fobs| − |Fcalc||∑
hkl (|Fobs|)
|Fobs| are represented by the reflection data (observations), |Fcalc| are calculated from (x,y,z) and B-valuesof the atoms of the model.
For small molecules, R–values between 2% and 5% are normal, for macromolecules, the range is approx-imately 20%–30%.
As a rule of thumb one can expect an R–value about 1/10 of the resolution: a 2.5Å structure should havean R–value of 25%.
Molecular Biology 11 Protein Crystallography III
![Page 13: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/13.jpg)
Refinement and Overfitting
Since the amplitudes lack some information (their phase) and are not ideal (for protein structures, theerrors are fairly large), this difference can be nearly arbitrarily reduced by adding more and more atomsthat were not really present in the crystal structure or allowing positions that chemically do not make muchsense (stereochemical clashes). This is called overfitting of data. It is therefore important to imposerestraints and constraints.
One measure to reduce overfitting is the Rfree–value. About 5%–10% of the reflections are excludedfrom minimisation of the R–value. They remain unconsidered and are like an “independent judge”: afterrefinement, the Rfree value is calculated like the R–value, but with the excluded reflections. The two valuesmust not differ too much.
Molecular Biology 12 Protein Crystallography III
![Page 14: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/14.jpg)
Constraints and Restraints
The reflection data alone would not be sufficient to create a trustworthy model. There are too many pa-rameters. Therefore it is necessary to incorporate additional information. This is done by using restraintsand constraints.
Constraints are fixed conditions and cannot be changed (e.g. occupancy of atoms).
Restraints allow variation within certain limits.
These ideal values are derived from high resolution structures that showed that certain geometric proper-ties of macromolecules do not vary a lot. . Examples are
• bond lengths (e.g. C − C = 1.54Å)
• planarity of aromatic rings (Phe, Tyr,...)
• anti-bumping (unbonded atoms cannot get to close)
Most models of macromolecules can only be built because of this extra information. It improves the datato parameter ratio.
Molecular Biology 13 Protein Crystallography III
![Page 15: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/15.jpg)
Maximum Likelihood
A more modern approach than least-squares is the maximum likelihood method. It applies statisticalassumptions and allows to include more data and information, e.g. experimental phases. For macro-molecules, maximum likelihood is more stable and leads to overall better results, often with reducedmodel bias.
Maximum likelihood incorporates errors of the data and avoids that a model is built with higher accuracythan the data would permit.
Molecular Biology 14 Protein Crystallography III
![Page 16: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/16.jpg)
Getting Started
The first steps in building the model consist of finding larger groups of residues with special features.
In protein this is the (Cα) main chain, in nucleic acids the phosphate backbone. α–helices are particularlyeasy to locate, even at medium to low resolution (2.5–4Å).
Molecular Biology 15 Protein Crystallography III
![Page 17: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/17.jpg)
Directionality
From the main chain (Cα–chain) one cannot determine the direction, nor which part of the sequence itcovers. One gets help from the so-called christmas tree: the side chains of an α–helix point towards theN–terminal end of the protein chain
Selenomethionine substituted proteins have become very popular for MAD–experiment. The heavy se-lenium atoms are easy to find in the electron density map and help docking the sequence to the map.Disulphide bridges or metals bound to an active centre can also be helpful.
Molecular Biology 16 Protein Crystallography III
![Page 18: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/18.jpg)
β–strands
β–strands are also striking but more difficult to build. Especially the direction of the peptide chain can bedifficult to find.
Molecular Biology 17 Protein Crystallography III
![Page 19: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/19.jpg)
Manual Building I
At high resolution (d<2Å), building is extremely facilitated by programs like Arp/Warp (A. Perrakis, V.Lamzin) or Resolve (T. Terwilliger), which automatically build large parts of the structure. These programscan even overcome local minima.
Refinement programs (either least-squares or maximum likelihood) cannot cross this barrier — they wouldget stuck in the local minimum and could not move the Phenylalanine into the right position.
Molecular Biology 18 Protein Crystallography III
![Page 20: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/20.jpg)
Manual Building II
Computer programs do not know about biology, certainly not of a specific molecule/structure.
• presence of ligands and/or metal ions (from crystallisation or protein preparation)
• special interaction for complexes
• exceptions from standard values used in refinement
• correct placement of solvent (water) molecules
Even this sort of information increases the data to parameter value and hence improves the quality of themodel. This becomes especially important at medium or low resolution (2.5Å and worse).
Molecular Biology 19 Protein Crystallography III
![Page 21: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/21.jpg)
What about hydrogen Atoms?
X-rays interact with the electron shell of atoms. The strength of interaction is proportional to the totalnumber. Hydrogen atoms only have one electron. They cannot be detected by X-ray diffraction (unlesswith very high resolution data, 1Å). This is different for neutron diffraction, which makes this techniquevery valuable for studies of enzymes and their active centres.
During refinement, hydrogens are treated as riding atoms, that is, in a fixed position relative to the groupsthey belong to (like the carbons of a phenylalanine ring).
Instead of completely ignoring hydrogens, this method improves the quality of the model and also aidsto keep the correct distances to neighbouring groups. Because of the fixed position, riding atoms do notincrease the number of parameters.
Molecular Biology 20 Protein Crystallography III
![Page 22: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/22.jpg)
Empty Space? — The solvent region
Molecular Biology 21 Protein Crystallography III
![Page 23: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/23.jpg)
The Solvent Model
Protein crystals are not very tightly packed. The space between the molecules is filled with solvent, 50–70% of the total volume on average. Because it is disordered, it contributes mostly to reflections below 6Åresolution (d>6Å).
Possible ways to treat the solvent are:
1. ignore the solvent — results in high R-value
2. ignore data with d>6Å — better R-value but worse maps
3. consider the solvent region as a flat lake of electron density
Molecular Biology 22 Protein Crystallography III
![Page 24: Protein Crystallography - GWDGshelx.uni-ac.gwdg.de/~tg/teaching/molbio/2004/day3.pdf · In protein crystallography we want to determine at least the coordinates for every atom of](https://reader030.vdocuments.us/reader030/viewer/2022041300/5e0f4f73bcdd936bb35ccdde/html5/thumbnails/24.jpg)
Example Refinement
This chart illustrates the steps of model building and their impact on the model quality (here only measuredby the R–value and Rfree–value (meaning explained tomorrow):
No. Action taken N(param.) R % Rfree %1 MR: pdb 7rxn 1 22.9 23.42 + 60 waters 1822 15.7 18.73 Fe, S anisotropic 1857 14.8 17.74 all H-atoms 1857 14.0 16.85 all C,N,O anisotropic 4097 8.8 11.36 + 28 waters, occ. 4556 7.5 10.37 6 disord. side chains 4698 6.9 9.7
Molecular Biology 23 Protein Crystallography III