assigning transmembrane segments to helices in intermediate-resolution structures angela enosh sarel...
Post on 18-Dec-2015
216 Views
Preview:
TRANSCRIPT
Assigning Transmembrane Segments to Helices in Intermediate-Resolution
Structures
Angela Enosh Sarel J. Fleishman Nir Ben-Tal &Dan Halperin
Adapted from a presentation made by Angela Enosh
Lecture Outline
Background The assignment problem The algorithm Validation
TM proteins form helixbundles
Figure 1: 3D structure of Bacteriorhodopsin
Transmembrane (TM) proteins cross membrane planes
Constitute approximately 50% of contemporary drug targets
Helices typically cross the membrane
Loops are typically located on the external/internal side of the membrane, connecting consecutive helices
Adapted from http://vertrees.org/ by Jason Vertrees
TM proteins amino-acid sequence
TM / EM segment 2D-arrangement can be predicted on basis of the sequence data alone
membran
e membran
e
TM protein 3D structure
Technical problems hamper TM protein structure determination
Only 30 distinct folds have been solved using high resolution methods such as X-ray crystallography
Cryo-electron microscopy (Cryo-EM)
Determines protein structure with low resolution ( >4Å)
Individual amino-acids cannot be identified
Supplies the locations of the helices Exact structure is left ambiguous
Cryo-electron microscopy (cryo-EM)
Bovine rhodopsin; adapted from Krebs et al. (2003) J. Biol. Chem. 278, 50217.
*
Problem description Input and Target
Position, orientation and azimuth of helices with respect to the membrane planes
Partitioning of the sequence into TM segments (helices) and extra membrane segments (loops)
Target: Find correspondence between the TM helix-segments and the cryo-EM helices
Attempt to reduce the number of possible assignments
Find the native assignment of:
TM segments (I-VII) to cryo-EM helices (A-G).
Given the helices seen in cryo-EM maps (A-G) Given the sequence classified as TM/EM segments (I-VII)
Example
The Algorithm Stage I: Pruning by distance constraints
Eliminate helices assignments based on the estimated maximal length of the loops.
Construction of an assignment graph that contains only the set of feasible assignments.
The Algorithm Stage II: Ranking the feasible assignments
Use known protein structures taken from the Protein Data Bank (PDB)
Score each assignment based on the capability of loops to connect pairs of helices in 3D.
Formal Statement of the problem
Sequence of all segments:
TM segments:
EM segments:
}...,{ , 21 iikiiii tttTST
}...,{ , 21 iikiiii xxxXSX
},...,,,{ 12211 nn TXXTXTS
Formal Statement of the problem (cont.)
3D Helix denotedcoordinates of the atoms
Membrane defined by inner and outer plane Maximal distance between two points that
can be connected by is denoted it is deduced from the distance between consecutive atoms, typically 3.8Å
The external and internal are denoted
}...{ 21 iikiii cccC C
iX )(max_dist iX
C
)(ext),int(Ci iCC
Formal Goals Find all feasible assignments of ‘s and
‘s An assignment is a permutation where
is assigned to Attribute a score to each assignment
based on the compatibility with locations of the helices
Remark: N-Termini and C-Termini can be deduced experimentally
iTiC
iT
)(iC
)(F
Stage I:Pruning by Distance Constraints
Acyclic Graph: Vertices: Edges:
),( int extEEVG },1:),{( njyCTV ji
)}max_dist(X)ext(C),ext(C(:),(),{(
)}max_dist(X)int(C),int(C(:),(),{(
imj1
imj1int
distCTCTE
distCTCTE
mijiext
miji
C
B
I-II
12 AA
II-III
4 AA
Valid path in G ~ feasible assignment
Short EM segments less feasible assignments
Graph Example
BI C
C
BA
AI CI
II A II IIB C
AIII BIII CIII
Graph construction Construction is bottom up A valid path in the graph is a path
which:
Starts at first level Ends at last level Alternating sequence of internal/external edges Does not contain two vertices with same helix
},...,,,{ 12211 nn veevev
Stage II:Ranking Feasible Assignments
A score is assigned to each feasible assignment stored in G
For each we define
defines the feasibility of connecting two helices in 3D-space by
1
1 )1()(),,()(
n
i iiik
kk CCXfF
)(F
!1 nk
f
iX
Based on the length of and a statistical analysis conducted on solved structures of soluble proteins
Only helix-loop-helix motifs used, denoted motif (A,L,B)
We examine all motifs with the same loop length (2-7)
Evaluationf
iX
Loop length classification
Only proteins which were less than 20% similar were selected
All motifs with length are placed in a common orthogonal reference frame so that all A’s overlap
The starting points of the B’s are placed in separate data structures
KD-trees are used for efficient axis aligned queries
Evaluation: preprocessingf
)72( KD ll
l
Distribution of end points of short loops
Kinematics considerations allow a reachable space limited only by the length of the loop
Example: loop length of 4 results in 8 degrees of freedom
In reality the end points tend to be highly nonuniform
Highly significant with loops of length two to five
Still noticeable in loops of lengths up to seven
Distribution of the end points of EM loops of length 4
Distribution of the end points of EM loops of lengths 3 (left) and 4 (right)
The 2 helices are placed in the same reference frame
Q is a cube around the start of B with a side size of Å
We define a colony function
the score depends on: number of neighboring points in the vicinity of q distances between these neighboring points and q
)(*10 iXlength
Qr
rqdist
iii eCCXf kk),(
)1()(),,(
Evaluation: scoringf
The score of the assignment is the total score of its extra membrane segments
Define a weight for each edge
For each pathwe define to be:
),,()()1()(
iii kk CCXfeweight
eeweightF )()(
EvaluationF
},...,,,{ 12211 nn veevev F
19 TM proteins with a known high resolution structure were tested
Two distinct cases:• Accurate data• Noisy data regarding the locations and
orientations of the helices
Validation
Dealing with uncertainty in cryo-EM data
Unknown orientation of the helix with respect to its axis
Unknown translation of the helix
Solution: A cylinder envelope is constructed around the end Termini
Name #h Loop lengths (#AA) Possible feasible rank
Bacterio-rhodopsin
7 3,14,2,3,10,47!=5040 948 13
Sensory rhodopsin
7 7,12,2,3,3,47!=5040 512 48
Lactose permease
12 3,2,1,3,1,24,3,1,3,1,112!>108 12 1
Cytochrome c oxidase E
5 5,6,1,15!=120 2 1
Cytochrome c oxidase H
3 7,23!=6 6 1
Acetylcholine receptor
4 4,4,1034!=24 22 1
Performance of the Algorithm
Summary
Provides more than a single assignment
The complexity of the problem scales with the
number of amino-acids in the extra-membrane
segments – not with the number of TM helices
Questions
top related