direct methods and many site se-met mad problems using bnp direct methods and many site se-met mad...

Direct Methods and Many Site Direct Methods and Many Site Se-Met MAD Problems using Se-Met MAD Problems using

BnPBnP

W. Furey

Classical Direct MethodsClassical Direct Methods

Main method for “small molecule” structure determination

Highly automated (almost totally “black box”) Solves structures containing up to a few

hundred non-hydrogen atoms in the asymmetric unit.

Direct Methods Assumptions Direct Methods Assumptions and Requirementsand Requirements

Non-negativity of electron density Atoms are “resolved”, i.e. “atomic resolution”

data are available Unit cell, symmetry and contents are known

Important Concepts - 1Important Concepts - 1

Normalized Structure Factors EH given by EH = FH / < |FH|2>1/2 with averaging in

resolution shells

The phase H of EH is the same as for FH

< |EH|2> = 1 hence “normalized”

Important Concepts - 2Important Concepts - 2 Structure Invariant - structural quantity

independent of choice of unit cell origin

Probabilistic estimates can be made for the values of structure invariants given the associated E magnitudes and cell contents

Fundamental formulas Fundamental formulas involving individual tripletsinvolving individual triplets

P(HK) = [2I0(AHK)]-1 exp(AHK cos HK) where P(HK) is the probability of the structure invariant having the value HK

AHK = 2 |EHEKE-H-K| / N1/2 where N is the number of atoms in the cell and the E’s are normalized structure factors

Note probability P(HK) increases as AHK increases, and that AHK is proportional to product of E’s and inversely proportional to N1/2

Expected value of cos HK is given by

<cos HK> = I1(AHK) / I0(AHK)

Cochran Distributionfor various K’s

vs K

3 = HK, K=AHK

Classical Direct Methods Classical Direct Methods Applications for ProteinsApplications for Proteins

Used for phase extension to very high resolution

Used with moderate success to locate heavy atom sites in isomorphous derivatives

E values used in molecular replacement calculations

Current Direct Methods Current Direct Methods Applications for ProteinsApplications for Proteins

Shake n Bake (based on minimum function) used to solve complete protein structures with over 1,000 atoms (rubredoxin, lysozyme, calmodulin etc.), provided data to 1.1Å or better is available

Used to locate anomalous scatterer sites from MAD or SAS data

General Shake n Bake ConceptGeneral Shake n Bake Concept

Use a multi-solution method starting with random phases (or randomly positioned atoms) in each trial.

For each trial phase set, use a “dual space” procedure iterating between real and reciprocal space optimization/constraints.

Reciprocal space optimization based on shifting phases to reduce the “minimum function” R()

Real space optimization and constraints based on computing new phases only from the largest peaks in map based on previous cycle phases

Each trial phase set ranked by value of R()

Generate random trial structure

Select “structure”from largest peaks

Compute phasesfrom structure

Shift phases toreduce R()

Compute mapfrom new phases

SnB inner loop for trial structure

Stop after N iterations

Choice of data for Se determinationChoice of data for Se determination

Use | |FH|+ - |FH|- | (anomalous) difference at single

Use | |FH|i - |FHlj | (dispersive) difference between two ’s

Use FA values (derived from data at all ’s)

Use FHLE values based on max anomalous and max dispersive differences

MAD PhasingMAD Phasing

For data collected at 1, 2 etc, choose a wavelength n as “native” data, and “reduce” that data set by averaging Bijvoet pairs.

For other “derivative” wavelengths d, reduce both by averaging Bijvoet pairs to form “isomorphous” data sets, and without averaging to form “anomalous” data sets.

MAD PhasingMAD Phasing For “isomorphous” and “derivative

anomalous” data sets, scale “derivative” to “native” and use scattering factors of

f0= 0, f’= f’(d) - f’(n), f”= f”(d)

For “native anomalous” data use original native Bijvoet pairs and scattering factors of

f0= 0, f’ = 0, f”= f”(n)

Phase Refinement OptionsPhase Refinement Options

“Classical” - P = centroid, Wh=1/E2,1/ <E2> or unity, PP=1, use reflections with FOM > 0.4-0.6

“Maximum Likelihood” - P stepped over allowed phases, PP= corresponding probability, Wh=1/E2, 1/ <E2> or unity, use reflections with FOM > 0.2

P, PP can also come from external source, i.e solvent flattened or NC-symmetry averaged maps.

Whh PP

| FPHobs |h |FPHcalc (P )|h P

2

Projection of peaks down NC twofoldProjection of peaks down NC twofold

MAD 1, 2, 3 data (Scalepack files)

“iso” and “ano” scaled files

“extension”file

all “native”(3) data

CMBISO CMBANO

PHASIT

MISSNG

FSFOUR

BNDRY

MAPINV

EXTRMP

MAPAVG

BLDCEL

“phase” file

“submap” file

“averaging” mask file

final map

MAD Phasing/Averaging StatisticsWavelength type dmin

(Å)No. refl Rano Riso dmin (Å)

(phasing)Rc Phasing

Power<FOM>

1,edge ano 2.3 72,632 0.063 - 2.6 - 3.47 0.3802,peak ano 2.3 72,996 0.060 - 2.6 - 3.45 0.447

3,remote ano 2.3 72,650 0.048 - 2.6 - 2.09 0.3891-3 iso 2.3 74,407 - 0.039 2.6 0.55 1.89 0.3932-3 iso 2.3 74,774 - 0.035 2.6 0.61 1.59 0.357

Mean FOM (combined) = 0.759 for 48,632 reflections (2.6Å)

Correlation coefficient between monomer density prior toNCS averaging = 0.764

Correlation coefficient between monomer density after NCSaveraging/phase combination = 0.906

Peak anomalous (Peak anomalous (2)2)difference Pattersondifference Patterson

With SnB it’s possible to automatically locate the anomalous scatterer substructure with data from any one of the dispersive combinations or anomalous pair sets

As expected, sets with the maximum dispersive or anomalous signal typically yield a greater frequency of success

Automated Applications ofAutomated Applications of BnP: Methodology BnP: MethodologyW. Furey,W. Furey,11 L. Pasupulati, L. Pasupulati,11

S. PotterS. Potter22, H. Xu, H. Xu22, R. Miller, R. Miller33 & C. Weeks & C. Weeks22

11University of Pittsburgh School of MedicineUniversity of Pittsburgh School of Medicine and VA Medical Centerand VA Medical Center

22Hauptman-Woodward Medical Research InstituteHauptman-Woodward Medical Research Institute33Center for Computational Research, SUNY at BuffaloCenter for Computational Research, SUNY at Buffalo

SnB Strengths1. Powerful, state-of-the-art

direct methods for automatically locating heavy atom sites

2. Friendly graphical user

interface.

SnB Weaknesses

1. Stops after finding sites, i.e no protein phasing

2. No software interface

PHASES Strengths1. Proven protein phasing (MAD,

MIRAS, etc), solvent flattening, NCS

averaging, external program interfacing

2. Interactive graphics

PHASES Weaknesses1. Doesn’t automatically find

heavy atom sites2. Script based, i.e. no GUI

Goal:Goal: Provide user-friendly software for automatic Provide user-friendly software for automatic determination of protein crystal structuresdetermination of protein crystal structures

Combine the SnB program with the “PHASES” package, putting everything under GUI control

Establish default parameters and procedures allowing all aspects of the structure determination to be fully automated

Also provide a manual mode allowing experienced users more control, and to facilitate development

Provide graphical feedback when possible

Facilitate coupling with popular external software

Adopted StrategyAdopted Strategy

Automatic substructure solution detection

Automatic substructure validation

Automatic hand determination (including space group changes, when needed)

Main Developments Required for Main Developments Required for Automated Structure DeterminationAutomated Structure Determination

Automatic Substructure Solution Automatic Substructure Solution DetectionDetection

Original MethodBased on histogram(Manual, time consuming,requires user interaction)

Current MethodBased on Rmin andRcryst statistics(Automatic, fast, no user interaction)

Automatic Substructure ValidationAutomatic Substructure Validation

Original MethodLeft up to user to decide which peaks correspond to true sites (Manual)

Current Method (auto mode)Based on occupancyrefinement against Bijvoetdifferences (Automatic, fast,requires no coordinate refinement, hand insensitive)

Current Method (manual mode)As in auto but can also comparepeaks from different solutions (Manual)

Automatic Substructure ValidationAutomatic Substructure Validation

Automatic Hand DeterminationAutomatic Hand Determination

Original MethodVisual inspection of map projections (Manual,requires user interaction)

Current Method(MAD, SIRAS or MIRAS)Based on variance differences in proteinand solvent regions (Automatic, fast since requires no refinement, also requires no user interaction)

Automatic Hand DeterminationAutomatic Hand Determination

Current Method(SAS data only)

Comparative analysis of R, FOM and CC after solvent flattening/phase combination. (Automatic, fast, requires no refinement)

Current Method(SIR, MIR data only)

Both hands tried, map examination needed. (Requires user interaction)

No man (or program) is an islandNo man (or program) is an islandImporting data files

Scalepack files D*Trek files MTZ files$

Free format files

Exporting control files

O RESOLVE 2.08 Arp/wARP 6.1.1

Exporting data files

Free format files CNS files MTZ files$

O files CHAIN files PDB files

Job submission from GUI

RESOLVE$ 2.08 Arp/wARP$ 6.1.1

$RESOLVE, Arp/wARP and/or CCP4 must be obtainedfrom their respective authors/distributors for theseoptions to work

Results for 1jc4Results for 1jc4

a=43.6 b=78.6, c=89.4 Å, = 91.95°, P21

4 molecules (592 residues) in asu2.1Å data, 3 MAD data

Substructure: Found 24 of 24 Se

Phasing: mean PP- 2.95; mean FOM- 0.661

Time to map: ~41 min on G4 (1.5 GHz) Powerbook

~13 min on G5 (2.7 GHz) Desktop

Auto Tracability:Resolve- 87% main chain, 68% side chainArp/wARP- 82% main chain, 73% side chain

SeMet ASU Size & Data Resolution PDB

Code

No.

Sites

No.

Residues NCS

d(Å)

PDB

Code

No.

Sites

No.

Residues NCS

d(Å)

1QC2 4 169 1 1.5 1CLI 28 1380 4 3.0

1BX4 7 345 1 2.25 1A7A 30 864 2 2.8

1CB0 8 283 1 2.2 1L8A 40 1772 2 2.6

1T5H 10 504 1 2.5 1E3M 45 1600 2 3.0

2JXH 12 576 2 3.1 1HI8 50 1328 2 2.8

1GSO 13 431 1 2.22 1GKP 54 2748 6 2.5

2TPS 15 454 2 2.7 1DQ8 60 1868 4 2.33

1DBT 19 717 3 2.49 1E2Y 60 1880 10 3.2

1JEN 22 668 2 2.25 1M32 66 2196 6 2.55

1JC4 24 592 4 2.1 1EQ2 70 3100 10 2.9

Phasing Flexibility (Manual Mode)

ConclusionConclusion

BnP is a user friendly, efficient, package for theautomated determination of protein structuresfrom x-ray diffraction data

BnP downloads for Linux, Apple G4, G5, & Intel, andSGI’s available (academic & non-profit institutions) at

http://www.hwi.buffalo.edu/BnP/

direct methods and many site se-met mad problems using bnp direct methods and many site se-met mad...

Documents