direct methods and many site se-met mad problems using bnp direct methods and many site se-met mad...
Post on 19-Dec-2015
212 views
TRANSCRIPT
Direct Methods and Many Site Direct Methods and Many Site Se-Met MAD Problems using Se-Met MAD Problems using
BnPBnP
W. Furey
Classical Direct MethodsClassical Direct Methods
Main method for “small molecule” structure determination
Highly automated (almost totally “black box”) Solves structures containing up to a few
hundred non-hydrogen atoms in the asymmetric unit.
Direct Methods Assumptions Direct Methods Assumptions and Requirementsand Requirements
Non-negativity of electron density Atoms are “resolved”, i.e. “atomic resolution”
data are available Unit cell, symmetry and contents are known
Important Concepts - 1Important Concepts - 1
Normalized Structure Factors EH given by EH = FH / < |FH|2>1/2 with averaging in
resolution shells
The phase H of EH is the same as for FH
< |EH|2> = 1 hence “normalized”
Important Concepts - 2Important Concepts - 2 Structure Invariant - structural quantity
independent of choice of unit cell origin
Probabilistic estimates can be made for the values of structure invariants given the associated E magnitudes and cell contents
Fundamental formulas Fundamental formulas involving individual tripletsinvolving individual triplets
P(HK) = [2I0(AHK)]-1 exp(AHK cos HK) where P(HK) is the probability of the structure invariant having the value HK
AHK = 2 |EHEKE-H-K| / N1/2 where N is the number of atoms in the cell and the E’s are normalized structure factors
Note probability P(HK) increases as AHK increases, and that AHK is proportional to product of E’s and inversely proportional to N1/2
Expected value of cos HK is given by
<cos HK> = I1(AHK) / I0(AHK)
Cochran Distributionfor various K’s
vs K
3 = HK, K=AHK
Classical Direct Methods Classical Direct Methods Applications for ProteinsApplications for Proteins
Used for phase extension to very high resolution
Used with moderate success to locate heavy atom sites in isomorphous derivatives
E values used in molecular replacement calculations
Current Direct Methods Current Direct Methods Applications for ProteinsApplications for Proteins
Shake n Bake (based on minimum function) used to solve complete protein structures with over 1,000 atoms (rubredoxin, lysozyme, calmodulin etc.), provided data to 1.1Å or better is available
Used to locate anomalous scatterer sites from MAD or SAS data
General Shake n Bake ConceptGeneral Shake n Bake Concept
Use a multi-solution method starting with random phases (or randomly positioned atoms) in each trial.
For each trial phase set, use a “dual space” procedure iterating between real and reciprocal space optimization/constraints.
Reciprocal space optimization based on shifting phases to reduce the “minimum function” R()
Real space optimization and constraints based on computing new phases only from the largest peaks in map based on previous cycle phases
Each trial phase set ranked by value of R()
Generate random trial structure
Select “structure”from largest peaks
Compute phasesfrom structure
Shift phases toreduce R()
Compute mapfrom new phases
SnB inner loop for trial structure
Stop after N iterations
Choice of data for Se determinationChoice of data for Se determination
Use | |FH|+ - |FH|- | (anomalous) difference at single
Use | |FH|i - |FHlj | (dispersive) difference between two ’s
Use FA values (derived from data at all ’s)
Use FHLE values based on max anomalous and max dispersive differences
MAD PhasingMAD Phasing
For data collected at 1, 2 etc, choose a wavelength n as “native” data, and “reduce” that data set by averaging Bijvoet pairs.
For other “derivative” wavelengths d, reduce both by averaging Bijvoet pairs to form “isomorphous” data sets, and without averaging to form “anomalous” data sets.
MAD PhasingMAD Phasing For “isomorphous” and “derivative
anomalous” data sets, scale “derivative” to “native” and use scattering factors of
f0= 0, f’= f’(d) - f’(n), f”= f”(d)
For “native anomalous” data use original native Bijvoet pairs and scattering factors of
f0= 0, f’ = 0, f”= f”(n)
Phase Refinement MinimizingPhase Refinement Minimizing
|FPHcalc h2 |FPobs | h
2 |FHcalc | h2
2 |FPobs |h |FHcalc | h cos P H h
( P )|
Wh P P
P
h
| FPHobs | h |FPHcalc P |h 2
where
Phase Refinement OptionsPhase Refinement Options
“Classical” - P = centroid, Wh=1/E2,1/ <E2> or unity, PP=1, use reflections with FOM > 0.4-0.6
“Maximum Likelihood” - P stepped over allowed phases, PP= corresponding probability, Wh=1/E2, 1/ <E2> or unity, use reflections with FOM > 0.2
P, PP can also come from external source, i.e solvent flattened or NC-symmetry averaged maps.
Whh PP
| FPHobs |h |FPHcalc (P )|h P
2
Projection of peaks down NC twofoldProjection of peaks down NC twofold
MAD 1, 2, 3 data (Scalepack files)
“iso” and “ano” scaled files
“extension”file
all “native”(3) data
CMBISO CMBANO
PHASIT
MISSNG
FSFOUR
BNDRY
MAPINV
EXTRMP
MAPAVG
BLDCEL
“phase” file
“submap” file
“averaging” mask file
final map
MAD Phasing/Averaging StatisticsWavelength type dmin
(Å)No. refl Rano Riso dmin (Å)
(phasing)Rc Phasing
Power<FOM>
1,edge ano 2.3 72,632 0.063 - 2.6 - 3.47 0.3802,peak ano 2.3 72,996 0.060 - 2.6 - 3.45 0.447
3,remote ano 2.3 72,650 0.048 - 2.6 - 2.09 0.3891-3 iso 2.3 74,407 - 0.039 2.6 0.55 1.89 0.3932-3 iso 2.3 74,774 - 0.035 2.6 0.61 1.59 0.357
Mean FOM (combined) = 0.759 for 48,632 reflections (2.6Å)
Correlation coefficient between monomer density prior toNCS averaging = 0.764
Correlation coefficient between monomer density after NCSaveraging/phase combination = 0.906
Peak anomalous (Peak anomalous (2)2)difference Pattersondifference Patterson
With SnB it’s possible to automatically locate the anomalous scatterer substructure with data from any one of the dispersive combinations or anomalous pair sets
As expected, sets with the maximum dispersive or anomalous signal typically yield a greater frequency of success
Automated Applications ofAutomated Applications of BnP: Methodology BnP: MethodologyW. Furey,W. Furey,11 L. Pasupulati, L. Pasupulati,11
S. PotterS. Potter22, H. Xu, H. Xu22, R. Miller, R. Miller33 & C. Weeks & C. Weeks22
11University of Pittsburgh School of MedicineUniversity of Pittsburgh School of Medicine and VA Medical Centerand VA Medical Center
22Hauptman-Woodward Medical Research InstituteHauptman-Woodward Medical Research Institute33Center for Computational Research, SUNY at BuffaloCenter for Computational Research, SUNY at Buffalo
SnB Strengths1. Powerful, state-of-the-art
direct methods for automatically locating heavy atom sites
2. Friendly graphical user
interface.
SnB Weaknesses
1. Stops after finding sites, i.e no protein phasing
2. No software interface
PHASES Strengths1. Proven protein phasing (MAD,
MIRAS, etc), solvent flattening, NCS
averaging, external program interfacing
2. Interactive graphics
PHASES Weaknesses1. Doesn’t automatically find
heavy atom sites2. Script based, i.e. no GUI
Goal:Goal: Provide user-friendly software for automatic Provide user-friendly software for automatic determination of protein crystal structuresdetermination of protein crystal structures
Combine the SnB program with the “PHASES” package, putting everything under GUI control
Establish default parameters and procedures allowing all aspects of the structure determination to be fully automated
Also provide a manual mode allowing experienced users more control, and to facilitate development
Provide graphical feedback when possible
Facilitate coupling with popular external software
Adopted StrategyAdopted Strategy
Automatic substructure solution detection
Automatic substructure validation
Automatic hand determination (including space group changes, when needed)
Main Developments Required for Main Developments Required for Automated Structure DeterminationAutomated Structure Determination
Automatic Substructure Solution Automatic Substructure Solution DetectionDetection
Original MethodBased on histogram(Manual, time consuming,requires user interaction)
Current MethodBased on Rmin andRcryst statistics(Automatic, fast, no user interaction)
Automatic Substructure ValidationAutomatic Substructure Validation
Original MethodLeft up to user to decide which peaks correspond to true sites (Manual)
Current Method (auto mode)Based on occupancyrefinement against Bijvoetdifferences (Automatic, fast,requires no coordinate refinement, hand insensitive)
Current Method (manual mode)As in auto but can also comparepeaks from different solutions (Manual)
Automatic Substructure ValidationAutomatic Substructure Validation
Automatic Hand DeterminationAutomatic Hand Determination
Original MethodVisual inspection of map projections (Manual,requires user interaction)
Current Method(MAD, SIRAS or MIRAS)Based on variance differences in proteinand solvent regions (Automatic, fast since requires no refinement, also requires no user interaction)
Automatic Hand DeterminationAutomatic Hand Determination
Current Method(SAS data only)
Comparative analysis of R, FOM and CC after solvent flattening/phase combination. (Automatic, fast, requires no refinement)
Current Method(SIR, MIR data only)
Both hands tried, map examination needed. (Requires user interaction)
No man (or program) is an islandNo man (or program) is an islandImporting data files
Scalepack files D*Trek files MTZ files$
Free format files
Exporting control files
O RESOLVE 2.08 Arp/wARP 6.1.1
Exporting data files
Free format files CNS files MTZ files$
O files CHAIN files PDB files
Job submission from GUI
RESOLVE$ 2.08 Arp/wARP$ 6.1.1
$RESOLVE, Arp/wARP and/or CCP4 must be obtainedfrom their respective authors/distributors for theseoptions to work
Results for 1jc4Results for 1jc4
a=43.6 b=78.6, c=89.4 Å, = 91.95°, P21
4 molecules (592 residues) in asu2.1Å data, 3 MAD data
Substructure: Found 24 of 24 Se
Phasing: mean PP- 2.95; mean FOM- 0.661
Time to map: ~41 min on G4 (1.5 GHz) Powerbook
~13 min on G5 (2.7 GHz) Desktop
Auto Tracability:Resolve- 87% main chain, 68% side chainArp/wARP- 82% main chain, 73% side chain
SeMet ASU Size & Data Resolution PDB
Code
No.
Sites
No.
Residues NCS
d(Å)
PDB
Code
No.
Sites
No.
Residues NCS
d(Å)
1QC2 4 169 1 1.5 1CLI 28 1380 4 3.0
1BX4 7 345 1 2.25 1A7A 30 864 2 2.8
1CB0 8 283 1 2.2 1L8A 40 1772 2 2.6
1T5H 10 504 1 2.5 1E3M 45 1600 2 3.0
2JXH 12 576 2 3.1 1HI8 50 1328 2 2.8
1GSO 13 431 1 2.22 1GKP 54 2748 6 2.5
2TPS 15 454 2 2.7 1DQ8 60 1868 4 2.33
1DBT 19 717 3 2.49 1E2Y 60 1880 10 3.2
1JEN 22 668 2 2.25 1M32 66 2196 6 2.55
1JC4 24 592 4 2.1 1EQ2 70 3100 10 2.9
Phasing Flexibility (Manual Mode)
ConclusionConclusion
BnP is a user friendly, efficient, package for theautomated determination of protein structuresfrom x-ray diffraction data
BnP downloads for Linux, Apple G4, G5, & Intel, andSGI’s available (academic & non-profit institutions) at
http://www.hwi.buffalo.edu/BnP/