sw-v6-prot-structure

Upload: dhananjay-kutkar

Post on 08-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 SW-V6-Prot-Structure

    1/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 1

    Was kann ich per Knopfdruck ber einePDB-Struktur lernen?

    PdbSum webseite:

    http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/

  • 8/7/2019 SW-V6-Prot-Structure

    2/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 2

    Klassifizierung in CATH

    http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/

  • 8/7/2019 SW-V6-Prot-Structure

    3/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 3

    Darstellung der Sekundrstruktur

    http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/

  • 8/7/2019 SW-V6-Prot-Structure

    4/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 4

    Konservierung innerhalb Proteinfamilie

    Oberflche entsprechend Konservierung

    eingefrbt.

    http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/

  • 8/7/2019 SW-V6-Prot-Structure

    5/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 5

    Multiples Sequenzalignment

    http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/

  • 8/7/2019 SW-V6-Prot-Structure

    6/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 6

    Ramachandran-Diagramm

    http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/

  • 8/7/2019 SW-V6-Prot-Structure

    7/42

  • 8/7/2019 SW-V6-Prot-Structure

    8/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 8

    Sekundrstrukturvorhersage: PSIPRED

    D.T. Jones, J Mol Biol 292, 195 (1999); http://bioinf.cs.ucl.ac.uk/psipred/

    Enge, sehr polare Bindungstasche auf Proteinoberflche.

  • 8/7/2019 SW-V6-Prot-Structure

    9/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 9

    Qualitt von PSIRED-Vorhersagen

    D.T. Jones, J Mol Biol 292, 195 (1999); http://bioinf.cs.ucl.ac.uk/psipred/

    Ergebnis fr 187 Testproteine mit unterschiedlichen Faltungen.

    Genauigkeit von PSIPRED:Ca. 75%

  • 8/7/2019 SW-V6-Prot-Structure

    10/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 10

    Vorhersage von TM-Helices

    http://darwin.nmsu.edu/~molb470/fall2003/Projects/koul/tmhmm.html

    Residuen in Transmembranhelices

    sind fast ausschlielich hydrophob.

    Lnge einer TM-Helix 20 Residuen.

    HMMs sind sehr erfolgreich um TM-

    Helices vorherzusagen (>90%Genauigkeit).

  • 8/7/2019 SW-V6-Prot-Structure

    11/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 11

    Analyse derOberflche: elektrostatisches Potential

    Sheinerman, Honig,J Mol Biol 318, 161 (2002)

    Proteinoberflchen an Protein-

    Protein-Bindungsstellen sind

    hufig elektrostatisch

    komplementr.

    Surface representation of the electrostatic

    potential of unbound monomers of 4

    protein-protein complexes. Open bookview of the proteinprotein interfaces is

    shown. Color range from deep red to

    deep blue corresponds to the range in the

    values of electrostatic potential from 10

    to +10kT/e, where kis the Boltzmann

    constant, Tis the absolute temperature

    and e is a proton's charge.

  • 8/7/2019 SW-V6-Prot-Structure

    12/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 12

    PROCHECK: Qualittscheck fr ProteinstrukturenTheRamachandran plot shows the phi-psi torsion angles for allresidues in the structure (except those at the chain termini). Glycines areseparately identified by triangles as these are not restricted to theregions of the plot appropriate to the other sidechain types.Colouring/shading scheme:the darkest areas (here shown in red) correspond to the "core" regionsrepresenting the most favourable combinations ofphi-psi values.The regions are labelled as follows:

    A - Core alpha B - Core betaL - Core left-handed alpha p - Allowed epsilona - Allowed alpha b - Allowed betal - Allowed left-handed alpha

    ~a - Generous alpha ~p - Generous epsilon~l - Generous left-handed alpha ~b - Generous beta

    The different regions were taken from the observed phi-psi distributionfor121,870 residues from 463 known X-ray protein structures. The twomost favoured regions are the "core" and "allowed" regions whichcorrespond to 10 x 10 pixels having more than 100 and 8 residues inthem, respectively. The "generous" regions were defined by Morris et al.(1992) by extending out by 20 (two pixels) all round the "allowed"

    regions. In fact, the authors found very few residues in these"generous" regions, so they can probably be treated much like the"disallowed" region and any residues in them investigated more closely.

    Ideally, one would hope to have over90% of the residues in the "core"regions. The percentage of residues in the "core" regions is one of the

    better guides to the stereochemical quality of a protein structure.

    http://www.biochem.ucl.ac.uk/~roman/procheck

  • 8/7/2019 SW-V6-Prot-Structure

    13/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 13

    PROCHECK

    The plot shows separate Ramachandran plots

    are shown for each of the 20 different amino

    acid types.The darkerthe shaded area on each plot, the

    more favourable the region. The data on

    which the shading is based has come from a

    data set of163 non-homologous, high-

    resolution protein chains chosen from

    structures solved by X-ray crystallography to aresolution of2.0 or better and an R-factor no

    greater than 20%.

    The numbers in brackets, following each

    residue name, show the total number of data

    points on that graph. The red numbers above

    the data points are the reside-numbers of the

    residues in question (ie showing those

    residues lying in unfavourable regions of the

    plot).

    http://www.biochem.ucl.ac.uk/~roman/procheck

  • 8/7/2019 SW-V6-Prot-Structure

    14/42

  • 8/7/2019 SW-V6-Prot-Structure

    15/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 15

    PROCHECKThe 6 graphs show how the structure (represented by the solid square)compares with well-refined structures at a similar resolution. The darkband in each graph represents the results from the well-refined structures;the central line is a least-squares fit to the mean trend as a function ofresolution, while the width of the band on either side of i t corresponds to a

    variation of one standard deviation about the mean. In some cases, thetrend is dependent on the resolution, and in other cases it is not.The 6 properties plotted are:a.Ramachandran plot quality. This property is measured by thepercentage of the protein's residues that are in the most favoured, orcore, regions of the Ramachandran plot. For a good model structure,obtained at high resolution, one would expect this percentage to be over90%. However, as the resolution gets poorer, so this figure decreases - asmight be expected. The shaded region reflects this expected decreasewith worsening resolution.b. Peptide bond planarity. This property is measured by calculating the

    standard deviation of the protein structure's omega torsion angles. Thesmaller the value the tighter the clustering around the ideal of180 degrees(which represents a perfectly planar peptide bond).c. Bad non-bonded interactions. This property is measured by thenumber of bad contacts per100 residues. Bad contacts are selectedfrom the list of non-bonded interactions and are defined as contacts wherethe distance of closest approach is less than or equal to 2.6.d. Calpha tetrahedral distortion. This property is measured bycalculating the standard deviation of the zeta torsion angle. This is anotional torsion angle in that it is not defined about any actual bond in thestructure. Rather, it is defined by the following four atoms within a given

    residue: Calpha, N, C, and Cbeta.e.Main-chain hydrogen bond energy. This property is measured by thestandard deviation of the hydrogen bond energies formain-chainhydrogen bonds. The energies are calculated using the method ofKabsch & Sander (1983).f. Overall G-factor. The overall G-factoris a measure of the overallnormality of the structure. The overall value is obtained from an averageof all the different G-factors for each residue in the structure.

    http://www.biochem.ucl.ac.uk/~roman/procheck

  • 8/7/2019 SW-V6-Prot-Structure

    16/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 16

    The 5 properties plotted are:

    a. Standard deviation of the chi-1

    gauche minus torsion angles.b. Standard deviation of the chi-1

    trans torsion angles.

    c. Standard deviation of the chi-1

    gauche plus torsion angles.

    d. Pooled standard deviation of allchi-1 torsion angles.

    e. Standard deviation of the chi-2

    trans torsion angles.

    PROCHECK

    http://www.biochem.ucl.ac.uk/~roman/procheck

  • 8/7/2019 SW-V6-Prot-Structure

    17/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 17

    PROCHECK

    http://www.biochem.ucl.ac.uk/~roman/procheck

    Distributions of each of the different

    main-chain bond lengths in the structure.

    The solid line in the centre of each plotcorresponds to the small-molecule mean

    value, while the dashed lines either side

    show the small-molecule standard

    deviation, the data coming from Engh &

    Huber (1991).Highlighted bars correspond to values

    more than 2.0 standard deviations from

    the mean, though the value of 2.0 can be

    changed by editing the procheck.prm file.

  • 8/7/2019 SW-V6-Prot-Structure

    18/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 18

    PROCHECK

    http://www.biochem.ucl.ac.uk/~roman/procheck

    Distributions of each of the different

    main-chain bond angles in the

    structure. The solid line in the centreof each plot corresponds to the small-

    molecule mean value, while the

    dashed lines either side show the

    small-molecule standard deviation,

    the data coming from Engh & Huber(1991).

    If any of the histogram bars lie off the

    graph, to the left or to the right, a large

    arrow indicates the number of theseoutliers (as in the CA-C-O and CB-CA-

    C plots above).

  • 8/7/2019 SW-V6-Prot-Structure

    19/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 19

    PROCHECK

    RMS distances from planarity for the

    different planar groups in the structure.

    The dashed lines indicate different idealvalues for aromatic rings (Phe, Tyr, Trp,

    His) and for planar end-groups (Arg, Asn,

    Asp, Gln, Glu).

    The default values are 0.03 and 0.02,

    respectively.

    http://www.biochem.ucl.ac.uk/~roman/procheck

  • 8/7/2019 SW-V6-Prot-Structure

    20/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 20

    Wie kann man 2 Proteinstrukturen vergleichen?

    Paarweise Sequenzvergleiche

    Paarweise Strukturvergleiche?

  • 8/7/2019 SW-V6-Prot-Structure

    21/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 21

    Partitioning protein space into homologous families

    Protein architecture. The tramtrack protein

    [Protein Data Bank entry 2drp (30)] is a

    small protein (525 heavy atoms,

    63 residues, and 6 elements of secondary

    structure), yet it exhibits typical modular

    protein architecture with two compact

    structural domains, the so-called zinc

    fingers.

    (A) The most detailed description of

    atomic positions is required to understand

    the function of the tramtrack protein (gray

    and black, running left to right), which

    involves binding to a specific base

    sequence of DNA (white).

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    22/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 22

    Partitioning protein space into homologous families

    (B) The complicated 3D shape of proteins

    is encoded in their linear sequence of

    amino acids. Side chains stripped off, the

    polypeptide backbone (thick) can be seen

    meandering from the bottom left to the

    upper right. Regular patterns of hydrogen

    bonding (thin lines) between amide and

    carbonyl groups of the polypeptidebackbone give rise to secondary structure,

    shown schematically in (C) as arrows for

    F strands and cylinders forE helices (with

    zinc atoms as spheres).

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    23/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 23

    Meaning of structural equivalenceShape comparison aims at the 1:1 enumeration of

    equivalent polymer units in 2 protein molecules.

    The problem and solution can be represented in

    3D, as a rigid-body superimposition; in 2D, as

    similar patterns in distance matrices; or in 1D, as

    an alignment of amino acid sequences.

    Here, the comparison of the tramtrack protein

    with another zinc finger protein, the human

    enhancer-binding protein MBP-1 [PDB entry1bbo], is used as an example.

    (A) In the 3D comparison, the problem is to find a

    translation and rotation of one molecule (red:

    1bbo) onto the other (blue: 2drpA). The 3D

    superimposition (residue centers only, green lines

    join equivalenced residue centers, zinc atoms as

    spheres) is not exact because of an internal

    rotation of the two zinc finger domains relative to

    one another.

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    24/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 24

    Ranges of similarity between proteins

    Holm et al. Prot Sci 1, 1691 (1992)

  • 8/7/2019 SW-V6-Prot-Structure

    25/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 25

    Surprising similarities

    Holm et al. Prot Sci 1, 1691 (1992)

  • 8/7/2019 SW-V6-Prot-Structure

    26/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 26

    Surprising similarities

    Holm et al. Prot Sci 1, 1691 (1992)

  • 8/7/2019 SW-V6-Prot-Structure

    27/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 27

    Surprising similarities

    Holm et al. Prot Sci 1, 1691 (1992)

  • 8/7/2019 SW-V6-Prot-Structure

    28/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 28

    Partitioning protein space into homologous families(B) The 2D distance matrices reveal

    the conserved structure of the zinc

    fingers (left: distance matrices of the

    whole structures; black dots are

    intramolecular distances less than

    12 , 1bbo at bottom and 2drpA on

    top; right: distance matrices brought

    into register by keeping only rows or

    columns corresponding tostructurally equivalent residues).

    (C) One-dimensional alignment of

    amino acid strings. Evolutionary

    comparison aligns the histidine (H)

    residues involved in zinc binding

    (bold; helices and strands ofsecondary structure are underlined).

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    29/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 29

    2 Algorithms for structural alignment(A) The 3D lookup is a fast heuristic algorithm that catches easy-to-find

    structural similarities and is part of the Dali 3D search server. The idea is

    that in favorable cases, 3D superimposition of only a pair of secondary

    structure elements (SSEs) leads to superimposition of the entire

    structures.

    Top: Structure comparison of an SH3 domain of c-Src kinase [1cskA,

    query structure] with the enzyme papain [1ppn, target structure] reveals

    similar domain folds, although there is no sequence relation between the

    proteins and one is much larger. The appropriate orientation of the

    molecules is found by exhaustive comparison of internal coordinate

    frames of each protein. An internal coordinate frame is defined by an

    ordered pair of SSEs (centering one SSE at the origin, aligning it with theyaxis, and rotating the molecule around this axis so that the center of a

    second SSE is in the positive x-yplane).

    Bottom left: Target structure, papain, loaded onto the SSE lookup grid.

    Each pair of SSEs where the segment midpoints are within 12 defines a

    coordinate frame relative to the grid axes. The figure shows the

    transformed positions of the 12 SSEs of papain (dotted lines) in each of

    the 100 different coordinate frames defined by different pairs of SSEs.

    Bottom right: The target lookup grid is probed with the SH3 domain, which

    has four SSEs (thick continuous lines). The coordinate frames shown are

    the ones yielding the best 3D match of four segments. Iterative extension

    of a residue-wise alignment starting from the preorientation defined by the

    SSE match shown here leads to the equivalence of 43 CE atoms with

    1.7 root-mean-square positional deviation on an optimal least-squares

    superimposition.Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    30/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 30

    Branch-and-bound algorithm(B) A branch-and-bound algorithm is guaranteed to yield the global optimum but may, in the

    worst case, need an exponential number of steps to do so. An implementation of this

    algorithm is an essential part of the Dali 3D search server.

    First, protein structures A and B are represented by distance matrices (bottom left and right;

    each point in a matrix is a residue-residue distance; an internal square is a set of contacts

    made by two segments; the secondary structure segments are F,F, and E). The problem of

    shape comparison becomes one of finding a best subset of residues in each matrix (subsets

    of rows and columns) such that the set of residues in protein A has a similar pattern of

    intramolecular distances as the set in protein A, as in Fig. 2B. A single solution to the

    problem is given in terms of the two sets of equivalent residues (an alignment), as shown in

    Fig. 2C. The solution space consists of all possible placements of residues in protein B

    relative to the segments of residues of protein A. The key algorithmic idea is to recursively

    split the solution subspace (schematically shown as a circle at upper left, in which each point

    is a solution to the problem and the lines divide subsets of solutions) that yields the highest

    upper bound until there is a single alignment trace left: start with the entire circle; calculatethe upper bound for the left (9) and right (17) half; choose the r ight half and split it into top

    (upper bound 10) and bottom (upper bound 16) quarters; choose the bottom part and split it

    (left: 14; right: 12); choose the right part; and so on until the area of solution space has

    shrunk to a single solution (shown as the residue-residue alignment matrix enlarged at right).

    The upper bound for each part of the solution space is estimated in terms of a simplified

    subproblem that asks for the best match of residues in protein B onto a predefined set of

    residues in protein A (the match is illustrated by the circle-ended line connecting the single

    square in matrix A with a set of candidate squares in matrix B). The best match is the one

    with the maximal pair score (sum of similarities of distances between the square in A and thesquare in B). The predef ined set corresponds to residues in secondary structure elements ( ,

    ). The upper bound for each of the segment-segment submatrices of matrix A is found by

    calculating the similarity scores between the submatrix in A and all accessible submatrices in

    B. An upper bound of the total similarity score (sum over all segment-segment submatrices in

    A) for one set of solutions is given by the sum of separately calculated upper bounds for each

    segment-segment pair of matrix A. The method for choosing constraints that define a set of

    solutions works in terms of defining allowed residue ranges at each stage of the iteration and

    is not illustrated.

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    31/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 31

    Recurrent folds

    (A) A small number of frequently occurring

    domains (folds) covers a large fraction of all

    known protein structures. The 287 structurallyunique protein domains (folds) are ranked in

    descending order of occurrence in the

    representative set of 740 proteins. Domains

    ranked 1 through 16 occur 10 or more times

    each. Domains ranked 1 through 26 cover 50%

    of all known structures that is, the essential partsof these structures can be constructed from

    these domains or described in terms of these

    domains (within the limits of similarity within a

    domain class). Domains ranked about 170 or

    higher occur only once in the current database

    (singlets).

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    32/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 32

    Partitioning protein space into homologous families(B) Examples of frequently observed fold classes, with one class

    from each of the attractor regions in Fig. 5 (each attractor region

    contains several classes, where the term "class" is defined in the

    text). Color coding indicates which parts of the fold are present in

    more or fewer members of the class. The color changes from

    light blue (regions present in 100% of members of the fold class)

    to red (0% occupancy). The representative classes are defined

    as follows (attractor, class name, and number of recurrences in

    sequence-unique set of 740 structures): attractor I: parallel F:

    COOH-terminal domain of succinyl-CoA synthetase F chain(126); attractor II: F-meander: mouse opg2immunoglobulin

    heavy chain variable domain (52); attractor III: E-helical:

    myoglobin; attractor IV: F-zigzag: COOH-terminal domain of

    pertussis toxin; and attractor V: F meander: COOH-terminal

    domain of phosphoglycerate dehydrogenase. Note that other fold

    classes in the same attractor region are not shown, but the most

    frequently occurring are shown in Fig. 5B.

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    33/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 33

    (C) Growth and redundancy of protein 3D structures in the

    Protein Data Bank.

    Entry: one of currently more than 4000 sets of proteincoordinates in the PDB.

    Family: collection of proteins set as equivalent if pairwise

    sequence identity exceeds 25%.

    Fold: fold class as defined above.

    The number of new structure entries grows rapidly in time (note

    logarithmic scale). Redundancy is defined in terms of sequencesimilarity (sequence families) or structure similarity (fold

    classes). Currently, there are about 6.4 entries per sequence

    family and 2.4 families per fold class, for a total of 15 entries per

    fold. One may expect that in the near future a new fold will

    appear for about every 15 new entries. The curve of new folds

    lags behind the curve of sequence-unique families, which

    indicates the increasing frequency of recurrent folds in newlysolved structures (although this may be the result of bias in

    experimental work). There is no indication that the growth in

    new fold classes is slowing down at present.

    Holm, Sander Science 273, 5275 (1996)

    Partitioning protein space into homologous families

  • 8/7/2019 SW-V6-Prot-Structure

    34/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 34

    Partitioning protein space into homologous families(B) 40% of all known domains (protein substructures) are covered by 16 fold

    classes (shown as topology diagrams; E, E-helix segment; F, F-strand segment;

    thick bar, parallel chain connection between segments; thin bars, antiparallel

    connection; arc, E helices crossing at roughly right angles). Although each foldclass has individual features, most fold classes map to five attractor regions

    (peaks I through V).

    All folds with sheets of mainly parallel F strands map to attractor I. The parallel

    F folds contain a F x F unit, where the intervening segment (x) is required to

    reverse chain direction so that the strands are parallel. The F E F unit has a

    preferred handedness determined by polymer physics and the natural twist ofFstrands. Attractor II contains a variety of helical folds. The connectivity of

    elements in the folds of attractors III and IV contains meander motifs suggestive

    of the collapse of a long hairpin, either ofF strands only or ofF strands

    alternating with a helical pair, (FEF)2. The F zigzag motif of attractor V is simply

    a series of antiparallel hairpin connections between sequentially adjacent

    strands. Elementary polymer physics indicates that interactions in space

    between regions of the chain that are close in sequence are much moreprobable than those between sequence-distant regions. The F zigzag motif

    occurs both in flat sheets and barrels, and there is considerable variation in the

    length of strands (about 4 residues in propeller blades, about 13 in porin

    barrels).Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    35/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 35

    Evolutionary adaptation of enzyme function

    (A) Discovery of an essential structure-function feature

    by shape comparison. A structure database search with

    DNA polymerase detects F kanamycinnucleotidyltransferase (rather than other known DNA or

    RNA polymerases) as the nearest neighbor in fold

    space and reveals conserved residues and structural

    features supporting the active site.

    Following up the lead provided by structure database

    searching with profile searches in sequence databases

    resulted in the identification of the same characteristics

    in a large superfamily of nucleotidyltransferases.

    The biological functions of member families range from

    DNA repair to regulation of biosynthetic pathways andantibiotic resistance.

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    36/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 36

    Partitioning protein space into homologous families

    (B) Variety of substrate specificity of a

    common chemical reaction on an essential

    protein substructure is the remarkable resultof biological evolution. All member enzymes

    of this extended family unified as a result of

    shape comparison catalyze a common

    chemical reaction, the coupling of nucleoside

    triphosphates (black squares and dots) to a

    free hydroxyl group by means of eliminationof pyrophosphate [top row: DNA polymerase

    F, DNA nucleotidyl exotransferase; middle

    row: polyadenylate polymerase, (2-5)

    oligoadenylate synthetase, kanamycin

    nucleotidyltransferase; bottom row: protein

    PII uridylyltransferase, glutamine synthetase

    adenylyltransferase, and streptomycin 3-

    adenylyltransferase].

    Holm, Sander Science 273, 5275 (1996)

  • 8/7/2019 SW-V6-Prot-Structure

    37/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 37

    Partitioning protein space into homologous familiesa, All-against-all structure alignment by DALI reveals a hierarchical

    organization of fold space. The method is sensitive enough to

    recognize similarities of general folding pattern e.g., the -sandwich

    topology of superoxide dismutase and immunoglobulin domains and

    selective enough to give higher scores to pairs of structures with more

    closely superimposable CE traces e.g., any two globins score

    higher than any globinphycocyanin pair. Structure similarity alone

    yields an operational definition of 'folds'. The thick circles denoting

    folds (left) are defined using a uniform radius for clusters of structural

    neighbors. The vertical bar (right) denotes cutting the fold dendrogram

    at a uniform value of structural similarity. However, the level of

    structural similarity, or degree of structural divergence, varies betweendifferent families, and we need other criteria to delineate superfamilies.

    b, Divergent evolution from a common ancestor retains not only the

    fold but also many functional features. This means that homologs

    remain in a structural neighborhood and can be delineated by similar

    functional attributes (marked here by similar color) in the map of fold

    space. Functional convergence (from independent evolutionary origins)

    would appear as blotches of similar color in disconnected regions of

    the map of fold space and in disjoint branches of the fold dendrogram.Partitioning the fold dendrogram in terms of functional similarities

    yields family-specific thresholds in terms of structural similarity (nodes

    that partition the fold dendrogram into functionally conserved

    superfamilies are circled on the right). This combination of structural

    and functional similarity measures results in an automatically

    generated hierarchical classification m_n at the fold (m) and

    superfamily (n) levels.

    Dietmann & Holm, Nat Struct Biol 8, 953 (2001)

  • 8/7/2019 SW-V6-Prot-Structure

    38/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 38

    Proteinstruktur-Analyse

    c, The principles are illustrated on a branch of the fold dendrogram

    consisting of aminopeptidases (1xjo and 1amp), carboxypeptidase

    (1aye), purine nucleoside phosphorylases (1b8oA, 1cb0A and

    1ecpA), pyrrolidone carboxyl peptidase (1a2zA), peptidyltRNAhydrolase (2pth) and hydrogenase maturating endopeptidase

    (1cfzA). The functional similarity between all pairs of structures is

    evaluated using a neural network with output J in the range 0

    (analogous)-1 (homologous) for example, J(1cb0A, 1b8oA) =

    0.91, J(1amp, 1aye) = 0.74, J(1cfzA, 2pth) = 0.59, J(1xjo, 1amp) =

    0.30 and J(1a2zA, 2pth) = 0.13. Here, line thickness indicates the

    magnitude of the term J(i,j) - U (Eq. 1) with color-coding for positive

    (red) or negative (blue) values. The threshold parameterU was

    arbitrarily set to 0.30 in this numerical example.

    d, The protein set is partitioned into superfamilies in the context of

    the fold dendrogram. Node scores s(C) are computed for each node,

    with = 0.30. For example, each structure is homologous to itself;

    therefore, leaf nodes get a score s(leaf) = 1.00 - U = 0.70, whereas

    s(1cfzA, 2pth) = (1.00 + 1.00 + (2 0.59)) / 4 -U = 1.98. The optimal

    partition (circled nodes) maximizes the sum of node scores overselected nodes (underlined scores). This optimal partition is stable

    for threshold values 0.09 < U < 0.53.

    Dietmann & Holm, Nat Struct Biol 8, 953 (2001)

  • 8/7/2019 SW-V6-Prot-Structure

    39/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 39

    Partitioning protein space into homologous families

    Dietmann & Holm, Nat Struct Biol 8, 953 (2001)

  • 8/7/2019 SW-V6-Prot-Structure

    40/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 40

    Proteinstruktur-Vergleich durch Feature-Vector

    Input fr Neuronales Netzwerk ist ein Feature-Vector.

    Dietmann & Holm, Nat Struct Biol 8, 953 (2001)

    Keyword similarity: Vektorprodukt fr Hufigkeiten von Swissprot-Keywrter innerhalb der beiden

    Sequenzfamilien.

    Functional preference is pro Aminosure definiert und wird ber alle Residuen in einem 3D-Cluster von

    konservierten Residuen summiert.

  • 8/7/2019 SW-V6-Prot-Structure

    41/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 41

    Funktionszuordnung per Strukturvergleich

    Dietmann & Holm, Nat Struct Biol 8, 953 (2001)

  • 8/7/2019 SW-V6-Prot-Structure

    42/42

    4. Vorlesung WS 2004/05 Softwarewerkzeuge 42

    Zusammenfassung

    Viele, sehr bequeme Tools verfgbar, mit denen man schnell einen guten

    berblick ber bestimmte Proteinstrukturen erhalten kann.

    Proteinstruktur ist evolutionr wesentlich lnger konserviert als Sequenz

    p Strukturvergleiche erlauben es, wesentlich entferntere Verwandtschaften

    aufzudecken.

    Numerische Klassifizierung erlaubt (nun erstmals) eine robuste, automatische

    evolutionre Klassifikation von Proteinstrukturen.

    Dietmann & Holm, Nat Struct Biol 8, 953 (2001)