comparative methods basic logics: the 3d structure of the protein is deduced from: 1.similarities...

15
Comparative methods Basic logics : The 3D structure of the protein is deduced from: 1. Similarities between the protein and other proteins 2. Statistical tendencies , characteristic of its sequence Physical aspects of the structure are not included in the prediction Major categories of comparative structure prediction : 1. Secondary structure prediction 2. Homology modeling 3. Fold recognition

Upload: emory-mitchell

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

P(H), P(E), P(turn) – frequency parameters for appearing in an α-helix, β- sheet, and turn F(i), F(i+1), F(i+2), F(i+3) – frequencies of being in 1 st to 4 th position of β-turn 1.Chou and Fassman (1974) Residue propensities + a sliding widow for prediction Major steps in secondary structure prediction

TRANSCRIPT

Page 1: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

Comparative methods

Basic logics:

The 3D structure of the protein is deduced from:

1. Similarities between the protein and other proteins

2. Statistical tendencies, characteristic of its sequence

Physical aspects of the structure are not included in the prediction

Major categories of comparative structure prediction:

1. Secondary structure prediction

2. Homology modeling

3. Fold recognition

Page 2: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

1. Secondary structure predictionBasic methodology:

• Each amino acid has a statistical propensity to appear in certain secondary structures (e.g. helix, sheet, turn)

• The individual amino acid propensities are additive

• Thus, the propensity of an entire protein segment can be calculated

• By using a ‘sliding window’, protein segments with strong secondary structure propensities can be identified

Page 3: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

P(H), P(E), P(turn) – frequency parameters for appearing in an α-helix, β-sheet, and turn

F(i), F(i+1), F(i+2), F(i+3) – frequencies of being in 1st to 4th position of β-turn

1. Chou and Fassman (1974)

Residue propensities + a sliding widow for prediction

Major steps in secondary structure prediction

Page 4: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

Success rate: ~50%

Y Y Y

Y

YY

YY

Y

Page 5: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

2. Sternberg (1987)

Incorporating evolutionary information in the calculation, in the form of multiple sequence alignments (MSAs)

(homologous proteins tend to have similar secondary structures)

Success rate: 69%

Page 6: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

3. Rost and Sander (1994) (PHD-Sec)

Combines neural networks (i.e. machine learning) with multiple sequence alignments

Success rates:

PHD-Sec – 72%; PREDATOR – 75%; PSIPRED – 77%

Page 7: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

Common problems in secondary structure prediction

• Prediction is problematic at the extremities of secondary elements

• Success rate is always under 100% - maybe due to tertiary effects in proteins

Page 8: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

2. Homology modeling

Basic logics:

• Homologous proteins (proteins with a common ancestor; high sequence identity) share similar structures

• Thus, the structure of a protein can be predicted according to its sequence similarity to proteins of known structure (family)

Page 9: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

Homology modeling includes the following steps:

1. Finding a ‘template’ protein with high enough sequence identity to the query protein (desirable: at least 30%) [PSI-BLAST]

2. Aligning the two sequences

3. Transferring the coordinates of identical amino acids from the template to the query protein (for non-identical residues - other prediction methods are used)

Page 10: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

4. Performing energy optimization to get rid of clashes and distortions

Page 11: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

5.

Page 12: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

Problems:

1. The number of proteins of known structure that can serve as templates (i.e. > 30% sequence identity) is limited

2. Predicting loops - loops are rich in insertions and deletions, and are therefore difficult to predict

Partial solution: combination of sequence-based methods and hydrophobicity profiles make it possible to infer the structure of loops

Page 13: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

3. Fold recognition (profile)

Basic logics:

• The sequence-based statistical tendencies (polarity, exposure, secondary structure) of the query protein are compared to those of other proteins with known structure

• The best match represents the protein of the closest fold to the query protein

Useful for:

1. Finding the fold of a query protein

2. Predicting whether a query protein has a novel fold

Page 14: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

1. Each of the 20 amino acids is classified according to 3 basic structure-related statistical tendencies: polarity, solvent exposure and secondary structure

2. Each position in the query protein is assigned a code, describing the specific tendencies of this position. This yields a structure-based sequence profile for the query protein

3. The profile is systematically compared to a library containing the profiles of all proteins of known structure

4. A match represents a protein with similar fold

5. If a match is not found, the query protein is assumed to have a novel fold

3. Fold recognition (profile): steps

Page 15: Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical

4. Fold recognition (Threading)

• A combination of homology modeling and structural profiles

• Like homology modeling: it predicts the structure of the query protein based on sequence alignments with template proteins

• However: instead of one 3D model, many low-resolution models are constructed by using different alignments

• The different models are evaluated based on residue-residue preferences in known structures (converted to energy terms by the Boltzman equation)