protein structure determinationand our software tools
TRANSCRIPT
Protein structure determination
and our software tools Mark Berjanskii
Edmonton Februrary 2015
Outline
1) X-ray crystallography
2) Cryo-electron microscopy (Cryo-EM)
3) NMR spectroscopy
4) Mass spectrometry
5) MS23D
Why do we need to know protein
structure?
Why do we need to know protein
structures?
1) Prediction of protein function from 3D structure (e.g. fold,
motifs, active site prediction)
2) Mechanism of protein function (e.g. enzyme catalysis,
structural effect of known mutations).
3) Rational drug design
4) Design of novel proteins with novel function. Sequence-to-
function and sequence-to-structure predictions
1) Ubiquitin
- degradation by the proteasome,
2) Ubiquitin-like modifiers
- function regulation by post-translation
modification
X-ray crystallography
X-ray crystallography
Quality metrics: 1) Experimental data: -Number of reflections -Signal to noise ratio 2) Model-to-experiment agreement: - R factor -R free factor
3) Coordinate uncertainty: - B-factor 4) Stereo-chemical normality: - backbone torsion angles (Ramachandran plot) - bond length, angles - side-chain torsion angles
X-ray Resolution
Minimum spacing (d) of crystal lattice planes that still provide measurable diffraction of X-rays. Minimum distance between structural features that can be distinguished in the electron-density maps.
High resolution Low resolution High resolution
Low resolution
200,000 reflections
500 reflections Many reflections Few reflections
Resolution and protein quality
X-ray resolution by proxy
ResProx would be able to detect
most of the withdrawn X-RAY
structures from the Murthy lab
ResProx vs X-ray resoltion
Number of protein structures
per year
Cryo-electron microscopy
(Cryo-EM)
Cryo-electron microscopy
Image formation in the electron
microscope.
(a) Electrons, emitted by a source that is
housed under a high
vacuum, are accelerated down the
microscope column . After passing through
the specimen, scattered electrons are
focused by the electromagnetic lenses of
the microscope
(b) Schematic illustrating the principle of
data collection for electron tomography. As
the specimen is tilted relative to the
electron beam, a series of images is
taken of the same field of view.
(c) Rendering of selected projection views
generated during cryo-electron tomography
3D image from Cryo-EM
Examples of Cryo-EM images
(a,b) Illustration of spiral architecture of
the nucleoid in Bdellovibrio
bacteriovorus showing
(a) a 210 Å thick tomographic slice
through the 3D volume of a cell
(b) a 3D surface rendering of the same
cell, with the spiral nucleoid
highlighted
(c) Higher magnification view of a
tomographic slice through the cell,
showing well-separated nucleoid
spirals and ribosomes (dark dots)
distributed at the edge of the
nucleoid.
(d) Expanded views of 210 Å thick
tomographic slices, showing top-
views of polar chemoreceptor arrays.
Cryo-EM revolution in structural
biology
Cryo-EM can now achieve a resolution
necessary for de novo structure
determination
Cryo-EM structures <5Å
Examples of “high-resolution”
de novo structures from Cryo-EM
A) transient receptor
potential cation
channel subfamily V
member 1 (TRPV1) ion
channel
B) F420-reducing [NiFe]
hydrogenase
C) large subunit of the
yeast mitochondrial
ribosome
D) γ-secretase.
NMR spectroscopy
Protein NMR spectroscopy Experiment Spectra processing Spectra assignment
NOE assignment
Distance restraints
Model generation
Resolution of NMR structures
Macromolecular NMR spectroscopy for the non-spectroscopist. Kwan AH, Mobli M, Gooley PR, King GF, Mackay JP. FEBS J. 2011 Mar;278(5):687-703
Protein NMR structures from
Wishart group 2B0F
Human Rhinovirus
3C Protease
1DE1
Oxidized bacteriophage
T4 glutaredoxin.
1DE2
Reduced bacteriophage
T4 glutaredoxin.
1NHO
Thioredoxin-like protein
(Mt0807)
1Z9V
MTH0776
Secondary structure from NMR chemical
shifts PANAV
s
CSI
Torsion angles from NMR
chemical shifts
Accessible surface area from
NMR chemical shifts
Prediction of NMR chemical
shifts from structure
Protein model building from NMR data
Validation of NMR protein models
Mass-spectrometry
Distance restraints from MS cross-
linking experiments
Distance restraints
Model generation
Residue accessibility by MS
limited proteolysis
Secondary structure localization
by MS HD exchange
Problem Many structural solutions may be compatible
with few restraints and solvent exposure info
1 restraint
2 restraints
3 restraints
Trypsin-inhibitor complex (1TAB)
Lys222E-Lys16I, Lys224E-Lys16I, and
Lys60E-Lys31I (21Å links)
Probing native protein structures by chemical cross-
linking, mass spectrometry and bioinformatics. Leitner A,
Walzthoeni T, Kahraman A, Herzog F, Rinner O, Beck M,
Aebersold R. Mol Cell Proteomics. 2010 Mar 31.
How many distance restraints
do we need?
For 2,3,4:
Young, M. M., Tang, N., Hempel, J. C., Oshiro, C. M., Taylor, E. W., Kuntz, I. D., Gibson, B.
W., and Dollinger, G. (2000) High throughput protein fold identification by using
experimental constraints derived from intramolecular cross-links and mass
spectrometry. Proc. Natl. Acad. Sci. U. S. A. 97, 5802-5806
1) Atomic resolution - 10- 20 restraints per residue (NMR)
2) Residue resolution – 3 restraint per residue
3) Fold – 1/10 restraint per residue ( = protein length/10 )
4) Complex of rigid bodies – 3 restraint per complex
5) Experimentally biased comparative / ab initio model – 1 restraint
Distance restraint requirements for different levels of
structure determination
One contact per 12 residues is
enough to model protein topology
MS-based structure determination
requires knowledge-based information
Cross-
links
Residue exposure
Advanced
Force-field:
solvation term
full electrostatic
knowledge-based
potentials
Fragment
information
Homology
information
Disulfide restraints can bias conformational
search for BPTI towards the native state No restraints Three disulfide restraints
Native No restraints
RMSD= 8.2A
3 disulfide restraints
RMSD= 2.14A
BPTI from disulfide distance restraints
PrP 90–232 modeled using the interlysine
cross-link distance constraints.
Mol Cell Proteomics. 2012 Jul;11(7): Use of proteinase K nonspecific digestion for selective and
comprehensive identification of interpeptide cross-links: application to prion proteins.
Petrotchenko EV1, Serpa JJ, Hardie DB, Berjanskii M, Suriyamongkol BP, Wishart DS, Borchers CH.
N-terminus of PrP 68-228 has propensity to interact with the end of helix B, which is the first PrP region to unfold at low pH
HB
HC HA
HC HA HB
HA
HC
pH 5.2 pH 3.2
CA of N-terminal residue Gly68 is shown with blue spheres.
MS-GAMDy
Rigid-body docking in Cartesian space by XPLOR
Monomer A Monomer B
Monomer B backbone optimization
Monomer A backbone optimization
Docking
Dimer backbone optimization
Distances for monomer A
Starting model for monomer A
Distances for monomer B
Starting model for monomer B
Inter distances for dimer
XPLOR rigid-body docking with initial alignment by distance restraints
1 min, 64 structures with no restraint violations from 64
Beta-strand pairing - 5 beta-strands Unknowns: 1) parallel or anti-parallel 2) interacting residues 3) internal or external
Shift Partial coverage
assign ( ( resid 1 OR resid 2 OR resid 3 OR resid 4 OR resid 7 OR resid 6 OR resid 5 ) and name N ) ( ( resid 12 OR resid 13 OR resid 14 OR resid 17 OR resid 16 OR resid 15 ) and name O ) 2.8 0.8 0.2 assign ( ( resid 1 OR resid 2 OR resid 3 OR resid 4 OR resid 7 OR resid 6 OR resid 5 ) and name O ) ( ( resid 12 OR resid 13 OR resid 14 OR resid 17 OR resid 16 OR resid 15 ) and name N ) 2.8 0.8 0.2 assign ( ( resid 1 OR resid 2 OR resid 3 OR resid 4 OR resid 7 OR resid 6 OR resid 5 ) and name N ) ( ( resid 66 OR resid 67 OR resid 68 OR resid 71 OR resid 70 OR resid 69 ) and name O ) 2.8 0.8 0.2 assign ( ( resid 1 OR resid 2 OR resid 3 OR resid 4 OR resid 7 OR resid 6 OR resid 5 ) and name O ) ( ( resid 66 OR resid 67 OR resid 68 OR resid 71 OR resid 70 OR resid 69 ) and name N ) 2.8 0.8 0.2 assign ( ( resid 41 OR resid 42 OR resid 43 OR resid 45 OR resid 44 ) and name N ) ( ( resid 66 OR resid 67 OR resid 68 OR resid 71 OR resid 70 OR resid 69 ) and name O ) 2.8 0.8 0.2 assign ( ( resid 41 OR resid 42 OR resid 43 OR resid 45 OR resid 44 ) and name O ) ( ( resid 66 OR resid 67 OR resid 68 OR resid 71 OR resid 70 OR resid 69 ) and name N ) 2.8 0.8 0.2 assign ( ( resid 41 OR resid 42 OR resid 43 OR resid 45 OR resid 44 ) and name N ) ( ( resid 48 OR resid 49 ) and name O ) 2.8 0.8 0.2 assign ( ( resid 41 OR resid 42 OR resid 43 OR resid 45 OR resid 44 ) and name O ) ( ( resid 48 OR resid 49 ) and name N ) 2.8 0.8 0.2
24 possible arrangements of beta-strands into a beta-sheets via XPLOR ambiguous restraints Example:
RMSD = 0.6A Convergence ~ 20%
Fragment-based modelling
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQR MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQR MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQR MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQR MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQR MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQR
Torsion angle restraints
Distance restraints
CS23D SFAssembler
Homodeller
Starting model pool
Ubiquitin from 2FAZA 4.86A
Template-derived distance restraints
82 N-O restraints
1UBQ GAMDy model0.6A
Ubiquitin from 2FAZA 4.86A
82 N-O restraints
GAMDy model 4.8A
GAMDy model 2A