nk university of athens - Εθνικόν και …...generic / minimal rigidity i graph g...
TRANSCRIPT
Distance Geometry for computing comformations
Ioannis Z. Emiris
NK University of Athens
Algs in Struct BioInfo, April 6, 2020
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Structure ab initio
Structure from Distances: Treat, e.g., 5,000 atoms with:
I NMR spectroscopy yields approximate distances (exact if< 5A), hence 3d structure, in solution [K.Wuthrich (ETHZ),
Chemistry Nobel’02] ”for his development of NMR spectroscopy for
determining the 3-dimensional structure of biological
macromolecules in solution”
I X-ray crystallography: more accurate distances (error ≤ 1A)but in crystal state, which takes ∼ 1 year.
I Electron microscopy, etc
NMR
I Software: Dyana [Guntert,Mumenthaler,Wuthrich’97],Embed [Crippen,Havel’88], Disgeo [Havel,Wuthrich’98],Dgsol [More,Wu], Abbie [Hendrickson], etc
I Physics: Specific isotopes have spin (±1/2) e.g.: H, C13.Each isotope absorbs / radiates back energy fromelectromagnetic (EM) pulse at specific “resonance” frequency.
I Steps:1. Constant magnetic field applied, spins aligned (polarized).2. EM pulse applied, specific nuclei stimulated/radiate energy3. Distance of nuclei-pairs depends on EM frequency:measured, and assigned to nuclei (semi-automatic).4. Nuclei coordinates in some frame (embedding) computedfrom (noisy) distances: our focus.
Mechanisms / Robots
Engineering
I Architecture, tensegrity
I Topography (Surveyors)
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Euclidean embedding
A graph (network) is described by its vertices (nodes) V and itsedges E .
Problem Embed-Rd (find coordinates):Given a weighted (distance) undirected graph (V ,E ), find anembedding (coordinate vectors)
f : V → Rd , d ≥ 1,
which also maps the given weights to (Euclidean) distances, i.e.,
dist2(f (v), f (v ′)) = weight(v , v ′), ∀(v , v ′) ∈ E .
Generic / Minimal Rigidity
I Graph G is generically rigid in Rd iff for generic edge lengthsit has a finite number of embeddings in Rd , modulo (ignoring)rigid motions.
I Graph G is minimally rigid iff it becomes non-rigid (flexible)once an edge is removed.
We call generically minimally rigid graphs simply rigid.
A rigid vs a non-rigid (flexible) graph in the plane R2.
Quad becomes rigid with one extra distance: 2 configurations possible
Planar rigidity
Theorem (Maxwell:1864,Laman:1970)
Graph G = (V ,E ) is rigid in R2 iff:
I |E | = 2|V | − 3, and
I |E ′| ≤ 2|V ′| − 3, ∀ vertex-induced subgraph (V ′,E ′).
Intuition: |E | constraints/equations, 2|V | − 3 coordinates (x , y)per node except for 2 for point at origin, one for point on x-axis.
Rigidity in R3
Generalized Laman: |E | = 3|V | − 6,|E ′| ≤ 3|V ′| − 6, ∀(V ′,E ′) ⊂ (V ,E ).
Counterexample: Double Banana:
Theorem (Cayley)
The 1-dimensional skeleta of triangulated/simplicial convexpolyhedra are rigid in R3.
For triangulated/simplicial polyhedra:|E | = 3|V | − 6, |E ′| ≤ 3|V ′| − 6, (V ′E ′) ⊂ (V ,E )
Thm applies when there exists an embedding s.t. V and E lie on aconvex polytope (or on the sphere): Double banana does not.
Algebraic system
#embeddings = #real solutions of a polynomial system expressingweighted edges E , and
(d+12
)+ 1 constraints for “pin-down” and
removing scaling.
in R2 :
x1 = y1 = 0,x2 = d12, y2 = 0,(xi − xj)
2 + (yi − yj)2 = d2
ij , (i , j) ∈ E .
in R3 :
x1 = y1 = z1 = 0,x2 = d12, y2 = z2 = 0,z3 = 0,(xi − xj)
2 + (yi − yj)2 + (zi − zj)
2 = d2ij , (i , j) ∈ E .
n = 7: 56 conformations [E-Moroz’11]
The case n = 6
I The cyclohexane has 16 real embeddings [E-Mourrain’99].
I The “jigsaw” parallel robot has 16 real configurations.
2 chairs, 2 twisted-boats/crowns given 6 equal distances, 6 equalangles, 10% perturbation ⇒ 12 distances (Laman).These are precisely the conformations mostly observed in nature.
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Matrix algebra
DefinitionFor any (rectangular) matrix A,
I rank(A) = r if r = #positive singular values.Recall: singular values ≥ 0.
A submatrix determinant is called minor.
LemmaFor any matrix A, rank(A) = max dimension of nonzero minor.Formally, rank(A) = r iff ∃r × r nonzero minor, and all k × k,minors vanish, k > r .
Distance matrix
DefinitionA distance matrix M is square with real entries Mii = 0,Mij = Mji ≥ 0.
DefinitionA distance matrix M is embeddable in Euclidean space Rd iff
∃ points pi ∈ Rd : Mij =1
2dist(pi , pj)
2.
Embeddable matrices in R3 correspond to 3D conformations sinceone can assign one or more 3d coordinate vector to each atom.
Cayley-Menger (or border) matrix
DefinitionDefine a Cayley-Menger (or border) matrix by appending a 0th rowand a 0th column to distance matrix M:
0 1 · · · 11...1
M
.
Again symmetric, 0-diagonal, non-negative entries.
Notice rank(CM matrix) = rank(M) + 2.
Distance geometry
Theorem (Cayley:1841,Menger’28)
M embeds in Rd iff Cayley-Menger (border) matrix has
rank
0 1 · · · 11...1
M
= d + 2,
and, for any (k + 1)× (k + 1) “border” minor D(i1, . . . , ik)indexed by rows/columns 0, i1, . . . , ik :
(−1)k D(i1, . . . , ik) ≥ 0, k = 2, . . . , d + 1.
∃ strict inequality D(i1, . . . , id+1) 6= 0 iff cannot embed in Rd−1.Trivially M embeds in Rδ: δ > d .
3D
Corollary
A distance matrix expresses 3D conformation iff border matrix hasrank= 5, and satisfies the triangle and tetrangular inequalities:
I For k = 2, D(i , j) = det
0 1 11 0 Mij
1 Mij 0
= 2Mij ≥ 0,
I for k = 3, by the triangular inequalities: −D(1, 2, 3) =
(d12+d13+d23)(d12+d13−d23)(d12+d23−d13)(d13+d23−d12)
I for k = 4, by the tetrangular inequalities.
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Robotics
Given some fixed geometric characteristics (angles, lengths) andthe position of the end-effector (here a ring), compute all possibleconfigurations defined by 6 consequent rotational DOFs.Inverse kinematics of a 6R robot with consecutive axes intersecting.
Cyclohexane
p1 p2 p3 p4 p5 p6
p1p2p3p4p5p6
0 1 1 1 1 1 11 0 u c x14 c u1 u 0 u c x25 c1 c u 0 u c x361 x14 c u 0 u c1 c x25 c u 0 u1 u c x36 c u 0
Known u, c from bond distance du = 1.526A (adjacent), bondangle φ ' 109.5o ⇒ dc ' 2.49A (law of cosines in rigid triangle)Rank = 5⇔ all 6× 6 minors = 0, some 5× 5 minor 6= 0.For unknowns x14, x25, x36, use 3 such (quadratic) equations.
Cycloheptane
v1 v2 v3 v4 v5 v6 v7
v1v2v3v4v5v6v7
0 1 1 1 1 1 1 11 0 c12 c13 x14 x15 c16 c171 c12 0 c23 c24 x25 x26 c271 c13 c23 0 c34 c35 x36 x371 x14 c24 c34 0 c45 c46 x471 x15 x25 c35 c45 0 c56 c571 c16 x26 x36 c46 c56 0 c671 c17 c27 x37 x47 c57 c67 0
14 known entries cij = d2
ij/2, 7 unknown xij ’s.2 · 7− 6 = 8 unknown coordinates ⇒ generically infite number(curve) of conformations.
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Distances and inner products
Consider n + 1 (unknown) points/vectors p0, . . . , pn ∈ R3, and allpossible distances among them:
d2ij = |pi − pj |2, d2
i0 = |pi |2, by setting p0 = 0 ∈ R3.
Vector length can be written in terms of inner/dot product:
(x , y , z) · (x , y , z) = x2 + y2 + z2 = |(x , y , z)|2.
Inner/dot product can be written using the transpose vector:
(x , y , z) · (x , y , z) = (x , y , z)T (x , y , z).
Then, d2ij = |pi−pj |2 = (pi−pj)·(pi−pj) = |pi |2−2pTi pj+|pj |2.
Matrix of inner products
Define Gram matrix G = [p1, . . . , pn]T [p1, . . . , pn] =: PTP, as then × n matrix of inner products:
G =
pT1 p1 pT1 p2 . . . pT1 pn
pT2 p1 pT2 p2 . . . pT2 pn
......
. . ....
pTn p1 pTn p2 . . . pTn pn
G is determined from distances, once a point is the origin:
d2ij = |pi |2 − 2pTi pj + |pj |2 ⇔ pTi pj =
d2i0 − d2
ij + d2j0
2=: Gij .
Point coordinates from inner products (I)
Input: Gram matrix G of inner products pTi pj , p0 = 0.
[Gij ] = [pTi pj ] = PT · P, where P = [p1, . . . , pn] is 3× n.
G real symmetric ⇒ U = V . Singular Value Decomposition yields
G = UΣUT , UTU = I , Σ diagonal, entries σi ≥ 0.
rank[d2ij ] = 3⇒ rank(G ) = 3 ⇒ σ1, . . . , σ3 > 0 = σ4 = · · · = σn.
So all info is contained in 3× 3 up-left (principal) submatrix of Σ:
UΣUT =
[VU2
] [Σ′ 00 0
][V T UT
2 ].
Point coordinates from inner products (II)
Define 3× 3 diagonal Σ′ and n × 3 V s.t. G = VΣ′V T :
G = V
σ1σ2
σ3
V T , σi > 0.
Now let 3× 3 diagonal√
Σ′ = diag(√σ1,√σ2,√σ3).
Then, G = V√
Σ′√
Σ′V T = PTP ⇒ P :=√
Σ′V T .
Output: point coordinates P (up to rigid transforms) in R3.
Embeddability Theorem
Corollary (of Cayley-Menger)
Points pi embed in Rd , for min d , iff corresponding Gram matrixPTP has rank d .
In R3 : {pi} embed in R3 (not R2) iff G = PTP has rkG = 3.
TheoremFor matrix A = UΣBT (SVD), UΣ′V T is A’s best approximant ofrank ρ ≤ rank(A), where σ′k = σk , k = 1, . . . , ρ, σ′i = 0, i > ρ.
Embedding via SVD
Input: full distance graph (clique) on n + 1 points, distances maybe inaccurate (noisy).
Embedding Algorithm
0. Pick point as origin (indexed 0).1. Compute all distances dij .2. Determine G , and run SVD: G = VΣV T .
Goal: embedding P =√
ΣV T (size n × 3).3. Force rank(G ) = 3 by defining diagonal matrix Σ′ s.t.
σ′k = σk , k = 1, 2, 3, σ′i = 0, i = 4, . . . , n.4. Output coordinates P =
√Σ′V T (size n× 3), and p0 = (0, 0, 0)
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Minors
First use the inequalities:
det Border(2 points) = det
0 1 11 0 d2
ij/2
1 d2ij/2 0
≥ 0⇔ d2ij ≥ 0.
Triangular inequality: det Border(3 points) =(d12+d13+d23)(d12+d13−d23)(d12+d23−d13)(d13+d23−d12) ≥ 0
Tetrangular inequality: det Border(4 points) ≥ 0.
Eventually, use the rank condition:
All 6× 6 minors vanish ⇔ det Border(5 points) = 0, for all5-tuples.
Smoothing triangular inequalities
Triangle inequalities (equality iff coliner): For any 3 points inEuclidean space of any dimension (including R3) the triangleinequality holds:
|dik − dkj | ≤ dij ≤ dik + dkj .
Left-hand side inequality follows from right-hand side inequality.
Bound smoothing
Inaccurate distances dij given as intervals [lij , uij ], s.t. lij ≤ dij ≤ uij .
Improve upper bound uij by forcing triangular inequality:
uij ≤ uik + ukj .
All-min-paths in single pass, any order [Havel]: O(V 3).
Improve lower bound lij by forcing:
lij ≥ max{lik − ukj , lkj − uik},
where only one difference is positive e.g. uik > lik > ukj > lkj .This implies
lij ≥ lkm − uik − umj ,
where indices i , j , k,m are not necessarily all distinct.Independently of upper bounds by single-pass all-minpaths [Havel]
Example
ubd ≤ ubc + ucd = 12uac ≤ uad + ucd = 13uab ≤ uad + ubd = 20lbd ≥ lab − uad = 2lac ≥ lab − ubc = 3
Four atoms
For nonplanar atoms 1, 2, 3, 4, the Cayley-Menger determinant is:
0 1 1 1 11 0 d2
12 d213 d2
14
1 d221 0 d2
23 d224
1 d231 d2
32 0 d234
1 d241 d2
42 d243 0
> 0
Heron’s formula
Triangular: CM(a, b, c) = 16(Area of T)2 =
= (a + b + c)(−a + b + c)(a− b + c)(a + b − c).
Tetrangular: CM(a, b, c , d , e, f ) = 288(Volume of T)2.
′Hρων o Aλεξανδρευs (c. 10-70 AD) was an ancient Greek
mathematician and engineer who was active in his native city of
Alexandria [wikipedia]
Independence
If the given bounds satisfy triangle inequality, only 7 inequalitiesare non-redundant, derived from the tetrangular inequalities:
Consider the (3,4) distance: For upper limit u34:CM(l12, u13, u14, u23, u24, u34) ≥ 0,CM(u12, l13, l14, u23, u24, u34) > 0,CM(u12, u13, u14, l23, l24, u34) > 0,
For the lower limit l34 we have:CM(u12, u13, l14, l23, u24, l34) > 0,CM(u12, l13, u14, u23, l24, l34) > 0,CM(l12, l13, u14, l23, u24, l34) > 0,CM(l12, u13, l14, u23, l24, l34) > 0.
Bound smoothing
Input: intervals [lij , uij ], s.t. lij ≤ dij ≤ uij , for unknown distancedij , where lij ≤ uij ; notice lij = uij iff dij = lij = uij .
Algorithm:0. Tighten intervals using the triangle inequality (linear pass ofgraph).1. Fix a Tolerance value > 0.2. Check all
(n4
)quadruples of nodes, applying 7 inequalities.
3. Repeat (2) until max change in any bound is < Tolerance.
Order of quadruples/inequalities does not affect output.BUT: step (2) may progress very slowly to final result dependingon order.Tetrangle inequalities much tighter than triangular, but slow.
Slow progress
Regular pentagon:5 edges: u = l = 1,3 shown diagonals: l = 1.617,true diagonal = 1.618 = 2 cos 36o
Triangle-Bound-Smoothing yields upper bound = 2 for 5 diagonals;u24 by quadruple (2,3,4,5), then used in (2,4,5,1) for u25.
After 30 passes, tolerance = 10−14, we haveu24 = 1.6207323507579925, u25 = 1.6207323507579441.
Outline
Motivation
Rigidity theory
Distance geometry
MoleculesSmall moleculesDistances to coordinatesNoisy dataMatrix perturbations
Structure-preserving matrix perturbations
Let σi (A) ≥ 0 be the i-th singular value of matrix A.
Theorem. [Wicks-Decarlo’95] Given matrix B, there exists t ∈ R,P ∈ {0, 1}n×n (perturbation) s.t. f (t) = σn(B − tP) is continuousand f ′(t) = −uTPv , where u, v are the n-th singular vectors.So a Newton-like iteration finds P, t: σn(B − tP) ' 0.
Heuristic. [Nikitopoulos-E’02] If, moreover, border matrix B issufficiently close to an embeddable matrix (local minimum), thealgorithm applies for any σk , 6 ≤ k ≤ n.
Iterative algorithm.– Minimize σ6(B − tP) thus minimizing σk , 6 < k ≤ n, too.– Suitable t > 0, P (symmetric, 0-diagonal, 0’s on 1st row /column) found in O(n2), preserves B’s structure, reduces σ6.– Repeat until σ6 reduces by less than some threshold ε > 0.
Performance on ring molecules
Matlab code perturbes matrix B to minimize 6-th singular value,preserves B’s structure (symmetric, diagonal 0, entries > 0);precision = 16 digits [Nikitopoulos-E:J.Math.Chem’02].
#atoms Init. σ6 Final σ6 Iterations Time [sec.] KFlops
10 2.38e-02 2.95e-13 3 0.11 10911 3.16e-02 2.60e-12 3 0.16 16512 8.13e-02 1.20e-07 3 0.22 28213 8.09e-02 8.49e-08 3 0.30 45014 3.72e-02 6.04e-13 3 0.49 60615 3.53e-02 2.02e-14 3 0.77 94016 3.78e-02 1.72e-12 3 1.15 140417 3.83e-02 1.70e-13 3 1.54 208218 3.53e-02 3.93e-13 3 2.14 303919 3.80e-02 4.59e-14 3 2.91 434420 4.00e-02 7.09e-13 3 3.79 6136
Flop=Floating-point operation.