protein tertiary structure comparison dong xu computer science department 271c life sciences center...

61
Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: [email protected] 573-882-7064 (O) http://digbio.missouri.edu

Post on 15-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Protein Tertiary Structure

Comparison

Dong Xu

Computer Science Department271C Life Sciences Center

1201 East Rollins RoadUniversity of Missouri-Columbia

Columbia, MO 65211-2060E-mail: [email protected]

573-882-7064 (O)http://digbio.missouri.edu

Page 2: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Lecture Outline

Why structural alignment

Technical definition

SSAP

DALI

Fast search

Protein families

Page 3: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structure Is Better Conserved during

Evolution

Structure can adopt a wide range of mutations.

Physical forces favorcertain structures.

Concept of fold.

Number of fold is limited. Currently ~1000 Total: 1,000s ~10,000s TIM barrel

Page 4: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Alignment of Protein Structure

Three-dimensional structure of one protein compared against three-dimensional structure of second protein

Atoms (protein backbones) fit together as closely as possible to minimize the average deviation

 

Page 5: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Why Align Structures? (1)

Additional measure of protein similarity Structure generally preserved better than

sequence over the course of evolution Provide more information on the

relationship between proteins than what sequence alignment can offer

Allows classification of proteins based on structural similarities

Page 6: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Why Align Structures? (2)

Basis for protein fold identification (prediction)

Sometimes sequence similarity between two proteins exists, but is not strong enough to produce an unambiguous alignment (gold standard for sequence comparison).

Pinpoint the active sites more accurately. Allows identification of common sub-

structures of interest

Page 7: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Why Align Structures? (3)

Illustrate features of protein family:

Evolution of the globin family

Page 8: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Illustrate interesting evolutionary/functional relationship between proteins:

Two ferredoxins, 1DOI and

1AWD, are aligned structurally,

showing an insertion in 1DOI

that contains potassium-ion

binding sites. This may be the

result of adaptations to the high

salt environment of the Dead Sea.

Why Align Structures? (4)

Page 9: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Lecture Outline

Why structural alignment

Technical definition

SSAP

DALI

Fast search

Protein families

Page 10: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

T

Simple case – two closely related proteins with the same number of amino acids.

Structure alignment

Find a transformationto achieve the best superposition

Page 11: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Transformations

o Translation

o Translation and Rotation -- Rigid Motion (Euclidian space)

txx

'

txRx

'

Page 12: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Types ofStructure

Comparison

o Sequence-dependent vs. sequence-independent structural alignment

o Global vs. local structural alignment

o Pairwise vs. multiple structural alignment

Page 13: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Given two sets of 3-D points :P={pi}, Q={qi} , i=1,…,n;

rmsd(P,Q) = √ i|pi - qi |2 /n(root mean square deviation)

Find a 3-D rigid transformation T* such that:

rmsd( T*(P), Q ) = minT √ i|T(pi) - qi |2 /n

Sequence-dependent Structure Comparison

(1)

Page 14: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

1234567ASCRKLE¦¦¦¦¦¦¦ASCRKLE

1

2

3 45 6

7

1

2

34 5

6 7

1

2

3 45 6

71

2

3

4 56

7

Minimize rmsd of distances 1-1,...,7-7

N

i

iyixN

rmsd 2))()((1

Sequence-dependent Structure Comparison

(2)

Page 15: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Sequence-dependent Structure Comparison

(3)

o Can be solved in O(n) time.

o Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics.

o Evaluation protein structure prediction.

Page 16: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Correspondence is Unknown!

find T which produces “largest” superimpositions of corresponding 3-D points.

Given two configurations of points in the three dimensional space,

T

Sequence-independent Structure Comparison

Page 17: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Order-Dependent vs. Order-Independent

Comparison

residuesof proteinsequence

Alignment (order dependent): a correspondence between elements of two sequences with order (topology) kept (typical structural alignment)

bipartite matching (order-independent): one-to-one matching

FSEYTTHRGHR: ::::: ::FESYTTHRPHR

FESYTTHRGHR:::::::: ::FESYTTHRPHR

Page 18: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

1. Number of amino acid correspondences created.

2. RMSD of corresponding amino acids

3. Percent identity in aligned residues

4. Number of gaps introduced

5. Size of the two proteins

6. Conservation of known active site environments …

No universally agreed upon criteria. It depends on what you are using the alignment for.

Evaluating Structural Alignments

Page 19: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

1ABR:B - ABRIN-A1BAS:_ - BASIC FIBROBLAST GROWTH FACTOR (BFGF) Seq. identity = 10% RMSD = 1.9Å

Structural AlignmentOutput

Page 20: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Lecture Outline

Why structural alignment

Technical definition

SSAP

DALI

Fast search

Protein families

Page 21: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

How to recognize structural

similarities

1. By eye (SCOP)

2. Algorithmicallyo point-based methods use properties of points

(distances) to establish correspondence Dynamic programming (SSAP) Distance matrix (DALI)

o secondary structure-based methods use vectors representing secondary structures to establish correspondences (LOCK).

o Image processing based method.

Page 22: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structural Comparison Algorithms

Due to the high compute complexity, practical algorithms rely on heuristics

Fully automated structure analysis has not been as successful as analyses with human intervention in taking in to account the biological implications

Page 23: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

SSAP

SSAP: Secondary Structure Alignment Program

Incorporates double dynamic programming to produce a structural alignment between two proteins

Page 24: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

The similarity between residue i in molecule A and residue k in molecule B is characterised in terms of their structural surroundings

This similarity can be quantified into a score, Sik

Based on this similarity score and some specified gap penalty, dynamic programming is used to find the optimal structural alignment

Basic Ideas of SSAP

Page 25: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Distance between residue i & j in molecule A ; dAi,j

Similarity for two pairs of residues, i j in A & k l in B ;

,,bdd

as

Bkl

Aij

klij a,b constants

Scoring Function of SSAP (1)

i

jl

k

Page 26: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Similarity between residue i in A and residue k in B ;

n

nmB

mkkA

mii

kibdd

aS

,,

,

Si,k is big if the distances from residue i in A to the 2n nearest neighbours are similar to the corresponding distances around k in B

Scoring Function of SSAP (2)

Page 27: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

This works well for small structures and local structural alignments - however, insertions and deletions cause problems unrelated distances

HSERAHVFIM..

GQ-VMAC-NW..

i=5

k=4

A :

B :

The actual SSAP algorithm uses Dynamic programming on two levels, first to find which distances to compare Sik, then to align the structures using these scores

Alignment Gaps in SSAP

Page 28: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Steps in SSAP (1)

1)      Calculate vectors from C of one

amino acid to set of nearby amino acidsVectors from two separate proteins compared

Difference (expressed as an angle) calculated, and converted to score

2)      Matrix for scores of vector differences from one protein to the next is computed.

Page 29: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

3)      Optimal alignment found using global dynamic programming, with a constant gap penalty

4)      Next amino acid residue considered, optimal path to align this amino acid to the second sequence computed

Steps in SSAP (2)

Page 30: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

5)      Alignments transferred to summary matrixIf paths cross same matrix position, scores

are summed

If part of alignment path found in both matrices, evidence of similarity

Steps in SSAP (3)

Page 31: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

6)      Dynamic programming alignment is performed for the summary matrixFinal alignment represents optimal alignment

between the protein structures

Resulting score converted so it can be compared to see how closely related two structures are 

Steps in SSAP (4)

Page 32: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Summary of SSAP

Page 33: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Lecture Outline

Why structural alignment

Technical definition

SSAP

DALI

Fast search

Protein families

Page 34: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Distance Matrix Approach

Uses graphical procedure similar to dot plots

Identifies residues that lie most closely together in three-dimensional structure

Two sequences with similar structure can have dot plots superimposed

Page 35: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Distance Matrix

Similar 3D structures have similar inter-residue distances

Page 36: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

DALI

Distance Alignment Tool (DALI)

Uses distance matrix method to align protein structures

Assembly step uses Monte Carlo simulation to find submatrices that can be aligned

Page 37: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

DALI Summary

Page 38: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

DALI is based on distance matrices – 2D matrices containing all pairwise distances between points of a molecule

Distance matrices of two molecules are compared to find regions of similar patterns of distances, which indicate similarities in their 3D structure

Key algorithm steps:1. Divide distance matrices into overlapping sub-matrices of fixed size

2. Search through two matrices (of two molecules) to find similar patterns

3. Assemble matching pairs of sub-matrices in to larger sets to maximize their similarity score

Structural Analysis Algorithms – DALI (1)

Page 39: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structural Analysis Algorithms – DALI (2)

Assembly of aligned sub-matrices is done using a Monte Carlo optimization

Monte Carlo optimization is an iterative improvement by a random walk exploration of the search space, with occasional excursions in to non-optimal territory (i.e. occasionally, a move that reduces the overall score is carried out)

The occasional non-optimal moves help avoid getting “trapped” in local optima of the score function, improving the chance of finding the global optimum

Page 40: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

DALI Steps (1)

Page 41: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

DALI Steps (2)

Page 42: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

DALI Steps (3)

Page 43: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Lecture Outline

Why structural alignment

Technical definition

SSAP

DALI

Fast search

Protein families

Page 44: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Fast Structural Similarity

Search

Compare types and arrangements of secondary structures within two proteins

If elements similarly arranged, three-dimensional structures are similar

LOCK, VAST and SARF are programs that use these fast methods

Page 45: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Align Structures by Secondary Structures

Page 46: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structural Analysis Algorithms – LOCK

Both SSAP and DALI deal only with points (atoms) of the molecules

LOCK uses a hierarchical approach Larger secondary structures such as helixes and

strands are represented using vectors and dealt with first

Individual residues are dealt with afterwards

Assumes large secondary structures provide most stability and function to a protein, and are most likely to be preserved during evolution

Page 47: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

LOCK Algorithm

Key algorithm steps:1. Represent secondary structures as vectors

2. Obtain initial superposition by computing local alignment of the secondary structure vectors (using dynamic programming)

3. Compute residue superposition by performing a greedy search to try to minimize root mean square deviation (a RMS distance measure) between pairs of nearest backbone atoms from the two proteins

4. Identify “core” (well aligned) atoms and try to improve their superposition (possibly at the cost of degrading superposition of non-core atoms)

Steps 2, 3, and 4 require iteration at each step

Page 48: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

ProteinDBS

Shyu, Chi, Scott, Xu. Nucleic Acid Research. 32, W572 - CW575, 2004

Page 49: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Comparison between different methods

CATH Fully automated SSAP

SCOP Based on subjective interpretation of evolutionary history of

proteins

FSSP DALI

Agreement between CATH and SCOP may be at most 60%. FSSP vs CATH 40% FSSP vs SCOP 60%

Page 50: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Lecture Outline

Why structural alignment

Technical definition

SSAP

DALI

Fast search

Protein families

Page 51: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structure Families (1)

Homologous family: evolutionarily related with a significant sequence identity;

Superfamily: different families whose structural and functional features suggest common evolutionary origin;

Fold: different superfamilies having same major secondary structures in same arrangement and with same topological connections (energetics favoring certain packing arrangements);

Class: secondary structure composition.

Page 52: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

6 Classes of Protein Structures

(1)

1)  Class : bundles of helices connected by loops on surface of proteins

2)  Class : antiparallel sheets, usually two sheets in close contact forming sandwich

3)  Class /: mainly parallel sheets with intervening helices; may also have mixed sheets (metabolic enzymes)

Page 53: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

4) Class + : mainly segregated helices and anti-parallel sheets

5)  Multi-domain ( and ) proteins more than one of the above four domains

6)  Membrane and cell-surface proteins and peptides excluding proteins of the immune system

6 Classes of Protein Structures

(2)

Page 54: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structure of class proteins

Page 55: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structure ofclass proteins

Page 56: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structure of class proteins

Page 57: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Structure of class proteins

Page 58: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

20 most frequent common domains

(folds)

] 1TEN:_ 3-89 [2] 1RNL:_ 5-114 [3] 1A91:_ 6-77 [4] 1LDE:C 179-317 [5] 1SMG:_ 13-86

] 1TIG:_ 6-81 [7] 1PDO:_ 2-97 [8] 1OFG:A 29-160 [9] 1AV6:A 47-185 [10] 1AUZ:_ 11-106

1] 2PIA:_ 100-228 [12] 1VPT:_ 59-180 [13] 1IL7:_ 19-129 [14] 1BGD:_ 12-154 [15] 1OXP:_ 104-265

Page 59: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Reading Assignments

Suggested reading: Contemporary approaches to protein structure classification.

Mark B. Swindells, et al. BioEssay. Volume 20, Issue 11, 1998, Pages: 884-891

Optional reading: The structural alignment between two proteins: Is there a

unique answer? Adam Godzik, Protein Science (1996), 5 1325-1338

Protein Structure Similarities. Patrice Koehl, Current Opinions in Structural Biology (2001), 11 348-353

Page 60: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Develop a program that can perform protein structural alignment using SSAP:

1. The C coordinates of two proteins (A and B) of will be sent to the mailing list

2. Calculate the similarity matrix between residue i in A and residue k in B (let n = 4, a = b = 1):

3. Perform dynamic programming on Si,k, and retrieve the alignment to print out.

Project Assignment

n

nmB

mkkA

mii

kibdd

aS

,,

,

Page 61: Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia

Project Phase III Report

Due on 11/17, send me through email Write on top of Phase II report. 7-30 Pages As a draft of the final report Free style in writing (use 11pt font or larger) Present key results

Software implementation Benchmark (computing time) Computational data Interpret the meaning of the data