rigid body docking -...

Post on 22-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Rigid Body DockingWilly Wriggers

School of Health Information Sciences & Institute of Molecular MedicineUniversity of Texas – Houston

14 Years of Multi-Scale Modeling in EMGuoji Wang, Claudine Porta, Zhongguo Chen, Timothy S. Baker, John E. Johnson:Identification of a Fab interaction footprint site on an icosahedral virus by cryoelectron microscopy and X-ray crystallography. Nature, 355:275, 1992.

Phoebe L. Stewart, Stephen D. Fuller, Roger M. Burnett:Difference imaging of adenovirus: bridging the resolution gap between X-ray crystallography and electron microscopy. EMBO J., 12:2589, 1993.

“At that time, placing an atomic structure into an EM map seemed like a very dangerous idea…”Phoebe Stewart, 2003

1998: “Simulated Markers”

Actin filament: Reconstruction from EM data at 20Å resolution rmsd: 1.1Å

Willy Wriggers, Ronald A. Milligan, Klaus Schulten, and J. Andrew McCammon:Self-Organizing Neural Networks Bridge the Biomolecular Resolution Gap. J. Mol. Biol., 284:1247, 1998

1999: The First Algorithmic Packages

Willy Wriggers, Ronald A. Milligan, and J. Andrew McCammon:Situs: A Package for Docking Crystal Structures into Low-Resolution Maps from Electron Microscopy. J. Structural Biology, 125:185, 1999

Niels Volkmann and Dorit Hanein:Quantitative Fitting of Atomic Models into Observed Densities Derived by Electron Microscopy. J. Structural Biology, 125:176, 1999

2000-Present: The “Gold Rush” era: Many other labs join the effort to develop multi-scale modeling and visualization tools…

Advantages of a Collaborative Design

VMD,ChimeraSculptor

Resumé: 2001-2006

• ~2000 unique, registered users of Situs• 6 workshops in Houston, San Diego, New Hampshire, Finland, France• ~1000 citations• 34 collaborators around the world• 25 publications (our lab)• ~50 publications using Situs (collaborators)

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛ −= 22

23expσ

rrG

Actin filament(Holmes et al., 1990)

Convolution withGaussian:

Q: We can lower the resolution of 3D data, but how can one increase it?A: Combine low- with high-resolution data by flexible and rigid-body fitting.

2σ=0Å 2σ=10Å 2σ=20Å 2σ=40Å

Combining Multi-Resolution Biophys. Data

( )( )

*em1

calc

( )C

ff

f

e

e

ρ

ρ

− ⊗

=

⎡ ⎤×⎢ ⎥⎢ ⎥⎣ ⎦

T

LOW-RESOLUTIONEM MAP

COMPUTE CORRELATION

ROTATE

FFT

UPDATE LATTICE:SAVE MAX. C VALUE + CORRESPONDING Θ,Φ,Ψ

ATOMICSTRUCTURE

FFT

TARGET PROBE

GAUSSIAN g

( )*em ( )f e ρ⊗ h

em( )ρ r atomic( )ρ r

atomic ( )ρ rRFILTER e

atomic

calc

( )

( )

g ρ

ρ

rR

r

( )calc ( )ef ρ⊗ h

ROTATIONAL SPACE3 EULER ANGLES

Θ,Φ,Ψ → R

FILTER e

6D Search with FTM

Grid size 6ÅResolution 15Å90 steps (30481 rotations)

Only Laplacian filtering successfully restores the initial pose

standard cross-correlation

w/ Laplacianfiltering

Example: RecA

E. Coli RNA polymerase

(Seth Darst)

E. Coli RNA polymerase + factor GreB

Need to perform difference mapping to localize GreB(difficult at variable helical symmetry)

Problem: different helical arrays

Registration of Two EM Maps

Rigid-body docking: The RNAP “jaws” are open in presence of GrepB factor, perform flexible map fitting

Map fitting new in Situs 2.2.

Registration and Difference Mapping

FRM: Fast Rotational Matching

Euler angle search is expensive!

9º angular sampling (30481 rotations) requires > 10 minutes on standard workstation for rotations only. Rotations + translations: 10-20 hours.

Our Goal: We seek to FFT-accelerate rotational search in addition to translational search.

To do this, we need to do the math in rotational space and take advantage of expressions similar to convolution theorem that are best described bygroup theory.

Target map f

Probe map g

lmY are the spherical harmonic functions

∑ ∑−

= −=

=1

0

)()(ˆ)(B

l

l

lmlmlm uYsfsuf

∑ ∑−

= −=

=1

0

)()(ˆ)(B

l

l

lmlmlm uYsgsug

Expansion in Spherical Harmonics

FRM3D Method

3

( ) ( ) ( , , )c R f g R c α β γ= ⋅ =∫R

The correlation function is:

Expanding f and g in spherical harmonics as beforewe arrive at: 2 2

ˆ( , , ) ( ) ( )l l lpq qr pr

l

c p q r d d Iπ π= ∑

dsssgsfI lrlplpr

2

0

)(ˆ)(ˆ∫∞

=where:

α,β,γ are specially chosen Euler angles (origin shift).

13 ˆ( , , ) ( )Dc FFT cα β γ −=

The quantities (which come from rotation group theory) are precomputed using a recursive procedure.

)( 2πl

pqd

FRM

( ) ( , , )c R c α β γ=

2 2ˆ( , , ) ( ) ( )l l l

pq qr prl

c p q r d d Iπ π= ∑Precomputations:

FFT setup)( 2

πlpqd

Reading mapsCOMs & sampling

lprlrlp Irgrf ),(ˆ),(ˆ

ˆ( , , )c p q r1

3 ˆ( , , ) ( )Dc FFT cα β γ −=

Postprocessing& Saving

Cro

wth

er

( ) ( , ; )c R c φ ψ θ=ˆ( , ; ) ( )l l

pr prl

c p r d Iθ θ= ∑

Precomputations:FFT setup

Reading mapsCOMs & sampling

lprlrlp Irgrf ),(ˆ),(ˆ

ˆ( , ; )kc p r θ1

2 ˆ( , ; ) ( )k Dc FFT cφ ψ θ −=

Postprocessing& Saving

k =

0, …

, B

)(, klprB

kk d θθ π=

Comparison between FRM3D and Crowther

Timings of FRM3D and Crowther

(seconds)

37.43371.4°3.7519.33°0.971.666°

FRMCrowtherangular sampling

FRM6D (Rigid-Body Matching)

5 angular parameters.

The correlation function is now:

1 linear parameter remains, distance δ of movement along the z axis.

emρ

calcρδ

( ) ( )3

em calc

( , ; )

( ) ( '; ).e e

c R R

R Rρ ρ

δ

δ⊗ ⊗

′ =

⋅∫R

Rigid-Body Search by 5D FFT

The 5D Fourier transform of the correlation function turns out to be:

,

ˆ( , , , , ; ) ( 1) ( ).n l l l l llnh hm nh h m mnm

l l

c n h m h m d d d d Iδ δ′ ′ ′′ ′ ′ ′−

′ ′ = − ∑

The quantities are the so called two-center integrals, corresponding to the spherical harmonic transforms of the two maps, at a distance δ of one another:

( )llmnmI δ′

1 1 ' ' 20 02 2

0 0

ˆ ˆ( ) ( )( ) ( ) ( ) ( ) ( )sinll lm l m l lmnm em calc n nI l l r r d r dr d d

π

δ ρ ρ β β β β∞

′ ′′

⎡ ⎤′ ′ ′= + + ⎢ ⎥

⎣ ⎦∫ ∫

Test Cases

1afw (peroxisomal thiolase)

1nic (copper-nitrite reductase)

7cat (catalase)

1e0j (Gp4D helicase)

Test Cases

1

10

100

1000

10000

1000 10000 100000 1000000 10000000

Number of voxels

Tim

e (m

in)

FTMFRMFTMFRM

Efficiency: FRM vs. FTM

11° sampling,

standard workstation.

FRM

FTM

FTM: 3D FFT + 3D rot. searchFRM: 5D FFT + 1D trans. search

typical EM map size

• Gain: 2-4 orders of magnitude for typical EM map

Correlations: SummaryInput

Laplacian filterNo filter Local mask

Situs 6D exhaustive searches:• Rigid Body• Fast Translational Matching (2.2) • Fast Rotational Matching (2.3)• Density Filtering

Increasing Fitting Contrast

Summary: Correlation Based Matching

Questions?

Alternative

Alternative: “Simulated Markers”

Actin filament: Reconstruction from EM data at 20Å resolution rmsd: 1.1Å

Reduced Representations of Biomolecular Structure

Reduced Representations of Biomolecular Structure

emiwcalc

jw

Feature points (fiducials, landmarks), reduce complexity of search space

Useful for:

•Rigid-body fitting•Flexible fitting •Interactive fitting / force feedback•Building of deformable models

1w

2w

3w

4w

Vector QuantizationLloyd (1957)

Linde, Buzo, & Gray (1980)Martinetz & Schulten (1993)

Digital Signal Processing,Speech and Image Compression.Topology-Representing Network.

}

{ }jwEncode data (in ) using a finite set (j=1,…,k) of codebook vectors.3=ℜd

Delaunay triangulation divides into k Voronoi polyhedra (“receptive fields”):3ℜ

{ }jwvwvv jii ∀−≤−ℜ∈= 3V

1w

2w

3w

4w

Linde, Buzo, Gray (LBG) AlgorithmEncoding Distortion Error:

ii

mijiE wv∑ −=

voxels)(atoms,

2

)(

Lower iteratively: Gradient descent( ){ }( )twE j

( ) ( ) ( ) . 2

1 )( )(∑ −⋅=∂∂

⋅−=−−≡∆i

iriirjr

rrr mwvwEtwtwtw δεε

:r∀

Inline (Monte Carlo) approach for a sequence selected at random according to weights

( )tvi:im

( )( ). ~ )( )( riirjr wtvtw −⋅⋅=∆ δε

How do we avoid getting trapped in the many local minima of E?

Soft-Max Adaptation

1 1 0 )1(10

−===

−≤≤−≤− −

ksss

wvwvwv

rrr

kjijiji …

{ }( ) . )( ,~ 2k

1ri

i

s

j mijiewE wvr

∑ −∑−

=

λ

( ) { }( )jir wtvs ,

Avoid local minima by smoothing of energy function (here: TRN method):

( )( ) , ~ )( : ri

s

r wtvetwrr

−⋅⋅=∆∀−λε

Where is the closeness rank:

Note: LBG algorithm.not only “winner” , also second, third, ... closest are updated.

:0→λ:0≠λ ( )ijw

Can show that this corresponds to stochastic gradient descent on

Note: LBG algorithm.parabolic (single minimum).

. ~ :0 EE →→λE~ :∞→λ } ( )t λ⇒

QuestionAnswer

Codebook vector variability arises due to:• statistical uncertainty,• spread of local minima.

A small variability indicates good convergence behavior.Optimum choice of # of vectors k: variability is minimal.

Q: How do we know that we have found the global minimum of E?A: We don’t (in general).

But we can compute the statistical variability of the by repeating thecalculation with different seeds for random number generator.

{ }jw

Convergence and Variability

EMlow res. data

Xtalstructure

)(hjw )(l

jw

•Estimate optimum k with variability criterion. •Index map I: (m, n = 1,…,k).• k! = k (k-1)…2 possible combinations.• For each index map I perform a least squares fit of the to the . • Quality of I: residual rms deviation

• Find optimal I by direct enumeration of the k! cases (minimum of ).

nm →

)(ljw)(

)(h

jIw

I∆

∑ −=

=∆k

jI

lj

hjIk ww

1

2)()()(

1

Single-Molecule Rigid-Body Docking

Application Example

ncd monomer and dimer-decorated microtubules (Milligan et al., 1997)ncd monomer crystal structure (Fletterick et al., 1996,1998)

Search for Conformations

Two possible ranking criteria:

• Codebook vector rms deviation ( ).• Overlap between both data sets:

Correlation coefficient:

21

,,

2,,

21

,,

2,,

,,,,,,

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⋅=

∑∑

zyxzyx

zyxzyx

zyxzyxzyx

hl

lh

lhC

I∆

ncd motor (white, shown with ATP nucleotide) docked to EM map (black) using k=7 codebook vectors

Reduced Search FeaturesTop 20, 7!=5040 possible pairs of codebook vectors.

I (permutation)hlCI∆

For a fixed k, codebook rmsd is more stringent criterion than correlation coefficient!

Performance (I)

Dependence on resolution (simulated EM map, automatic assignment of k from ):3 9k≤ ≤

Deviation from start structure (PDB: 1TOP) used to generate simulated EM map.

Accurate matching up to ~30ÅCatastrophic misalignment

Performance (II)Is minimum vector variability a suitable choice for optimum k?

10 test systems, simulated EM densities from 2-100Å.

2-20Å (reliable fitting)22-50Å (borderline)52-100Å (mismatches)

Reasonable correlation with actual deviation

No “false positives” for resolution values < 20Åand variability < 1Å.

3 9k≤ ≤

Wriggers & Birmanns, J. Struct. Biol 133, 193-202 (2001)

Summary: Classic Situs (Versions 1.x)

Advantages of vector quantization:•Fast (seconds of compute time).•Reduced search is robust.

Limitations:•Original k->k algorithm works best for single molecules, not for matching subunits to larger densities.•Largely superseded by FTM and FRM in Situs 2.x•But newer application allows k->l matching (Sculptor: Stefan Birmanns)

Resources and Further Reading

WWW: http://situs.biomachina.org

Tools:colores (FTM)Contact me for beta version of FRMqvol, qpdb, qdock, qrange (vector quantization)

WWW: http://sculptor.biomachina.org

Acknowledgement:Yao Cong, Stefan Birmanns, Julio Kovacs, Pablo Chacon (http://biomachina.org)

top related