introduction to bioinformatics - tutorial no. 8 predicting protein structure psi-blast

18
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Post on 21-Dec-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Introduction to Bioinformatics - Tutorial no. 8

Predicting protein structure

PSI-BLAST

Page 2: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PHDsec and PSIpred

PHDsec Rost & Sander, 1993 Based on sequence family alignments

PSIpred Jones, 1999 Based on PSI-BLAST profiles

Both consider long-range interactions

Page 3: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PSIpred Input

Input sequence

Type of Analysis

Page 4: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PSIpred Input (2)Filtering Options

Email address

GO!

Page 5: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PSIpred Output

Conf: Confidence (0=low, 9=high)Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence

Conf: 988766667637889999877999871289878877049963202468899999997887Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE 10 20 30 40 50 60

Conf: 742888731467888768899999999999999987557888998875227887303678Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA 70 80 90 100 110 120

Confidence level

Predicted structure

Page 6: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PHDsec Input (1)

Email addressType of

prediction

Additional output

Output format

Reduce processing

Page 7: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PHDsec Input (2)

Type (number) of input sequences

Upload file

Enter sequence

Wait for results?

Page 8: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PHDsec Output (1)Protein

classification

Structure proportions

Amino acid proportions

Page 9: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PHDsec Output (2)Estimated structure

Confidence level

Structure with high

confidence

Page 10: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PSI-BLAST

Position-Specific Iterative BLAST Extension to BLASTP

Finds more distantly related sequences Distant sequences with insignificant E values

Even in distantly related sequences, important domains can be highly conserved PSI-BLAST gives more weight to those

Page 11: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

PSI-BLAST Profile

When close sequences are aligned – areas of conservation.

Scoring matrix becomes position specific Each column has a unique set of a.a.

frequencies. Score is column specific, based on a.a.

frequency. More frequent a.a. -> higher score.

A new sequence is scored based on the new scoring matrix.

123456

AMTYQR

CTTYQS

SMTYQA

Page 12: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Position-Specific Scoring Matrix

Page 13: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

A PSI-BLAST Iteration Collect all database sequence segments

that have been aligned with query sequence with E-value below set threshold (default 0.01)

Construct position specific scoring matrix for collected sequences. Rough idea:

Align all sequences to the query sequence as the template.

Assign weights to the sequences Construct position specific scoring matrix

Find sequences that mach the profile

Page 14: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Using PSI-BLAST (1)

Available from main

BLAST page

Or switch on in BLASTP

E value threshold for initial inclusion in multiple alignment for profile

Page 15: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Using PSI-BLAST (2)

Select whether to include in next iterationNew result

Align selected sequences, generate profile, search again

Number of results to show next iteration

Page 16: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Exercise 11. There is a protein with an unknown structure:

>some protein

MEAFLGTWKMEKSEGFDKIMERLGVDFVTRKMGNLVKPNLIVTDLGGGKYK

MRSESTFKTTECSFKLGEKFKEVTRFTRGHFFMITVENGVMKHEQDDKTKV

TYIERVVEGNELKATVKVDEVVCVRTYSKVA

Can BLAST help us to predict its SS?

2. Use any secondary structure prediction method to predict the

secondary structure of 1O8V and compare it to the solved structure. NOTICE! The secondary structure definition in PDB is given in a 7 letter code

instead of 3 letter code (H, E, C). For comparison purposes consider: G H and

I as H; E as E ; all the rest including spaces as C.

3. What can you conclude about the secondary structure prediction in

this case?

4. Are the results consistent with the confidence value of the prediction?

5. Can you explain the prediction results based on the real structure?

Page 17: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Exercise 1

Page 18: Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST

Exercise 2• Prion is the protein which responsible to the Mad Cow Disease.

In the normal situation the amino acids in a specific region are

arranged in α-helix (H1). In the abnormal situations this region

undergoes a change into a β-strand conformation. • This conformational change is thought to be the origin of the

disease, which brings to a rapid degeneration of the nerve

system, and usually causes death. • It is assumed that the prion molecules, which changed

conformations, accelerate the conformational change of

additional molecules.

1. Check what conformation is predicted for this protein.

2. The PDB code of the prion protein is 1ag2. The helix is located

at positions 21-30 on the sequence in this file. Does the

predicted SS correlates with the real one in the region of

interest?