predict protein

Upload: rednri

Post on 05-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Predict Protein

    1/1

    PredictProtein

    PredictProtein (Rost et al., 1994) uses a slightly different approach in making its predictions. First, the protein sequence is

    used as a query against SWISS-PROT to find similar sequences. When similar sequences are found, an algorithm called

    MaxHom is used to generate a profile-based multiple sequence alignment (Sander and Schneider, 1991). MaxHom uses

    an iterative method to construct the alignment:

    After the first search of SWISS-PROT, all found sequences are aligned against the query sequence and a profile is

    calculated for the alignment. The profile is then used to search SWISS-PROT again to locate new, matching sequences.

    The multiple alignment generated by MaxHom is subsequently fed into a neural network for prediction by one of a suite of

    methods collectively known as PHD (Rost, 1996). PHDsec, the method in this suite used for secondary structure

    prediction, not only assigns each residue to a secondary structure type, it provides statistics indicating the confidence

    of the prediction at each position in the sequence. The method produces an average

    accuracy of better than 72%; the best-case residue predictions have an accuracy rate

    of over 90%.

    Sequences are submitted to PredictProtein either by sending an E-mail message

    or by using a Web front end. Several options are available for sequence submission;

    the query sequences can be submitted as single-letter amino acid code or by itsSWISS-PROT identifier. In addition, a multiple sequence alignment in FASTAformat

    or as a PIR alignment can also be submitted for secondary structure prediction.

    The input message, sent to [email protected], takes the following

    form:

    After the name,affiliation, and address lines, the # sign signals to the server that a sequence in oneletter

    code follows. The sequence format is essentially FASTA, except that blanks

    are not allowed. For this alignment, the phrase do NOT align before the line starting

    with # assures that the alignment will not be realigned. Nothing is allowed to

    follow the sequence. The output sent as an E-mail message is quite copious but

    contains a large amount of pertinent information. The results can also be retrieved

    from an ftp site by adding a qualifierreturn no mail in any line before the line

    starting with #. This might be a useful feature for those E-mail services that have

    difficulty handling very large output files. The format for the output file can be plain

    text or HTML files with or without PHD graphics.

    The results of the MaxHom search are returned, complete with a multiple alignment

    that may be of use in further study, such as profile searches or phylogenetic

    studies. If the submitted sequence has a known homolog in PDB, the PDB identifiers

    are furnished. Information follows on the method itself and then the actual prediction

    will follow. In a recent release, the output can also be customized by specifying

    available options. Unlike nnpredict, PredictProtein returns a reliability index of

    prediction for each position ranging from 0 to 9, with 9 being the maximum confidence

    that a secondary structure assignment has been made correctly. The results

    returned by the server for this particular sequence, as compared with those obtained

    by other methods, are shown in modified form in Figure 11.4.