predict protein

out of 1

Upload: rednri

Post on 05-Apr-2018

222 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

7/31/2019 Predict Protein

1/1

PredictProtein

PredictProtein (Rost et al., 1994) uses a slightly different approach in making its predictions. First, the protein sequence is

used as a query against SWISS-PROT to find similar sequences. When similar sequences are found, an algorithm called

MaxHom is used to generate a profile-based multiple sequence alignment (Sander and Schneider, 1991). MaxHom uses

an iterative method to construct the alignment:

After the first search of SWISS-PROT, all found sequences are aligned against the query sequence and a profile is

calculated for the alignment. The profile is then used to search SWISS-PROT again to locate new, matching sequences.

The multiple alignment generated by MaxHom is subsequently fed into a neural network for prediction by one of a suite of

methods collectively known as PHD (Rost, 1996). PHDsec, the method in this suite used for secondary structure

prediction, not only assigns each residue to a secondary structure type, it provides statistics indicating the confidence

of the prediction at each position in the sequence. The method produces an average

accuracy of better than 72%; the best-case residue predictions have an accuracy rate

of over 90%.

Sequences are submitted to PredictProtein either by sending an E-mail message

or by using a Web front end. Several options are available for sequence submission;

the query sequences can be submitted as single-letter amino acid code or by itsSWISS-PROT identifier. In addition, a multiple sequence alignment in FASTAformat

or as a PIR alignment can also be submitted for secondary structure prediction.

The input message, sent to [email protected], takes the following

form:

After the name,affiliation, and address lines, the # sign signals to the server that a sequence in oneletter

code follows. The sequence format is essentially FASTA, except that blanks

are not allowed. For this alignment, the phrase do NOT align before the line starting

with # assures that the alignment will not be realigned. Nothing is allowed to

follow the sequence. The output sent as an E-mail message is quite copious but

contains a large amount of pertinent information. The results can also be retrieved

from an ftp site by adding a qualifierreturn no mail in any line before the line

starting with #. This might be a useful feature for those E-mail services that have

difficulty handling very large output files. The format for the output file can be plain

text or HTML files with or without PHD graphics.

The results of the MaxHom search are returned, complete with a multiple alignment

that may be of use in further study, such as profile searches or phylogenetic

studies. If the submitted sequence has a known homolog in PDB, the PDB identifiers

are furnished. Information follows on the method itself and then the actual prediction

will follow. In a recent release, the output can also be customized by specifying

available options. Unlike nnpredict, PredictProtein returns a reliability index of

prediction for each position ranging from 0 to 9, with 9 being the maximum confidence

that a secondary structure assignment has been made correctly. The results

returned by the server for this particular sequence, as compared with those obtained

by other methods, are shown in modified form in Figure 11.4.