predict protein
TRANSCRIPT
-
7/31/2019 Predict Protein
1/1
PredictProtein
PredictProtein (Rost et al., 1994) uses a slightly different approach in making its predictions. First, the protein sequence is
used as a query against SWISS-PROT to find similar sequences. When similar sequences are found, an algorithm called
MaxHom is used to generate a profile-based multiple sequence alignment (Sander and Schneider, 1991). MaxHom uses
an iterative method to construct the alignment:
After the first search of SWISS-PROT, all found sequences are aligned against the query sequence and a profile is
calculated for the alignment. The profile is then used to search SWISS-PROT again to locate new, matching sequences.
The multiple alignment generated by MaxHom is subsequently fed into a neural network for prediction by one of a suite of
methods collectively known as PHD (Rost, 1996). PHDsec, the method in this suite used for secondary structure
prediction, not only assigns each residue to a secondary structure type, it provides statistics indicating the confidence
of the prediction at each position in the sequence. The method produces an average
accuracy of better than 72%; the best-case residue predictions have an accuracy rate
of over 90%.
Sequences are submitted to PredictProtein either by sending an E-mail message
or by using a Web front end. Several options are available for sequence submission;
the query sequences can be submitted as single-letter amino acid code or by itsSWISS-PROT identifier. In addition, a multiple sequence alignment in FASTAformat
or as a PIR alignment can also be submitted for secondary structure prediction.
The input message, sent to [email protected], takes the following
form:
After the name,affiliation, and address lines, the # sign signals to the server that a sequence in oneletter
code follows. The sequence format is essentially FASTA, except that blanks
are not allowed. For this alignment, the phrase do NOT align before the line starting
with # assures that the alignment will not be realigned. Nothing is allowed to
follow the sequence. The output sent as an E-mail message is quite copious but
contains a large amount of pertinent information. The results can also be retrieved
from an ftp site by adding a qualifierreturn no mail in any line before the line
starting with #. This might be a useful feature for those E-mail services that have
difficulty handling very large output files. The format for the output file can be plain
text or HTML files with or without PHD graphics.
The results of the MaxHom search are returned, complete with a multiple alignment
that may be of use in further study, such as profile searches or phylogenetic
studies. If the submitted sequence has a known homolog in PDB, the PDB identifiers
are furnished. Information follows on the method itself and then the actual prediction
will follow. In a recent release, the output can also be customized by specifying
available options. Unlike nnpredict, PredictProtein returns a reliability index of
prediction for each position ranging from 0 to 9, with 9 being the maximum confidence
that a secondary structure assignment has been made correctly. The results
returned by the server for this particular sequence, as compared with those obtained
by other methods, are shown in modified form in Figure 11.4.