a poor mans eva: examining the accuracy of secondary ... 2002/cevallos.pdfrandal c. cevallos bioc...

47
Randal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction Introduction One of the best, but perhaps most elusive, things to know about a protein is its structure. Knowing the structure of a protein allows us to infer if not directly see a number of things. For instance, it can tell us how a transcription factor interacts with DNA or how two proteins can combine to form a particular heterodimer. A hallmark of protein activity is an event which causes a conformational change which ultimately results in the proteins altered ability to bind or release or interact in some other way with another protein, ion, or macromolecule. How a protein alters its structure would most likely not be known without solved crystal structures of a given protein. However, obtaining the crystal structure of a protein is a labor-intensive process which can take a number of years to successfully complete. Many investigators with novel proteins would be able to direct their research in a certain direction if they could have a solved structure quickly, but this is usually not possible, and some proteins are nearly impossible to crystallize. In lieu of having an actual structure, the next best thing would be an accurate way to predict protein structure given primary sequence. These predictions could lead an investigator down certain avenues of research that may have been ignored, possibly leading to rapid discoveries and a coveted publication in Nature or Science. Although a proteins tertiary structure is usually of the most interest, there can be no prediction of tertiary structure without the prediction of secondary structure. For this fact, an investigation into the accuracy of secondary structure predictors was launched. This is not a novel idea in the world of protein structure prediction in that there is a web- base server that continuously evaluates automatic structure prediction servers named

Upload: others

Post on 05-Sep-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project

A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

IntroductionOne of the best, but perhaps most elusive, things to know about a protein is its

structure. Knowing the structure of a protein allows us to infer if not directly see a

number of things. For instance, it can tell us how a transcription factor interacts with

DNA or how two proteins can combine to form a particular heterodimer. A hallmark of

protein activity is an event which causes a conformational change which ultimately

results in the proteins altered ability to bind or release or interact in some other way with

another protein, ion, or macromolecule. How a protein alters its structure would most

likely not be known without solved crystal structures of a given protein. However,

obtaining the crystal structure of a protein is a labor-intensive process which can take a

number of years to successfully complete. Many investigators with novel proteins would

be able to direct their research in a certain direction if they could have a solved structure

quickly, but this is usually not possible, and some proteins are nearly impossible to

crystallize. In lieu of having an actual structure, the next best thing would be an accurate

way to predict protein structure given primary sequence. These predictions could lead an

investigator down certain avenues of research that may have been ignored, possibly

leading to rapid discoveries and a coveted publication in Nature or Science.

Although a proteins tertiary structure is usually of the most interest, there can be

no prediction of tertiary structure without the prediction of secondary structure. For this

fact, an investigation into the accuracy of secondary structure predictors was launched.

This is not a novel idea in the world of protein structure prediction in that there is a web-

base server that continuously evaluates automatic structure prediction servers named

Page 2: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectEVA (http://cubic.bioc.columbia.edu/eva), which has collected more than 20,000

secondary structure predictions since June 2000 (1). As EVA analyzes the accuracy of a

given structure predictor, it gives it a Q3 score with the current top Q3 score of 74.4

belonging to PROFsec. Furthermore, there are yearly meetings known as CASP (Critical

Assessment of Structure Prediction) in which various protein structure prediction

programs are put to the test against recently solved sequences. However, there are a

number of structure prediction services that are actually overlooked by EVA and CASP

and it would be beneficial to examine the accuracy of these systems. Furthermore, there

are different approaches to predicting protein structure and a comparison among these

models would be beneficial as well. Therefore, I designed a rather rudimentary, but

effective method at looking at three different protein secondary structure prediction

services, PSIPRED (http://www.psipred.net), PROF king

(http://www.aber.ac.uk/~phiwww/prof/), and the SeqWeb contribution, PeptideStructure

(http://pmgm2.stanford.edu/gcg-bin/seqweb.cgi).

As mentioned above, there are different approaches to designing algorithms for

predicting protein structure. The first method is homology modeling, which essentially

builds a model of an unknown protein based on sequence similarity to proteins that

already have a known or solved structure (2). A second method is known as ab initio

prediction in which protein structure prediction is based purely on first principles without

the use of any templates as in homology modeling. The third method is known as

sequence-structure threading, or fold recognition which is similar to homology in that a

template of already known folds is compared to the unknown protein to derive a structure

prediction. A fourth method is docking, but is irrelevant to this discussion in that it is

Page 3: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Projectused for predicting quaternary structure of proteins (protein-protein interactions). It

seems as if the method of choice for structure prediction is the “comparative modeling”

methods like fold recognition and homology modeling. Many feel that within 10 years,

there will be at least one example of most structural folds, which would make

comparative modeling very useful (3). Two of the three services analyzed in this paper

use comparative modeling, PSIPRED and PROF king. Specifically, they both use

homology modeling by searching a database to look for homologous sequences (PSI-

BLAST) and then use the known structures of the found sequences to predict secondary

structure. PROF king also uses ClustalW for sequence alignments of results obtained

from PSI-BLAST whereas PSIPRED uses neural networks to directly analyze the output

from PSI-BLAST. The third service, PeptideStructure, I consider to be an ab initio

method of structure prediction. Specifically, PeptideStructure uses the Chou-Fasman

parameters which assign a set of values (called prediction values) to a residue and then an

algorithm is applied to those numbers to determine which areas are alpha-helices and

which are beta-strands (or sheets) (4). The details can be cumbersome and won’t be

discussed further, but the importance lies in the fact that PeptideStructure uses no known

protein sequences to make its predictions.

Materials and MethodsUsing the NCBI website,

(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure), 10 different protein

sequences with known structures were obtained (See Appendix A for accession numbers

and sequences). The proteins picked were of different lengths, from different species,

and varied in their secondary structure. The proteins chosen were: RNase Hi, RNase Sa,

Page 4: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectMyoglobin, Polio 3C proteinase, Green Fluorescent Protein (GFP), eIF6, Procaspase-7,

Procathepsin L, RNA Polymerase, and Trypsin IV. The known secondary structures of

each of these proteins were then retrieved from the Protein Data Bank (PDB,

http://www.rcsb.org/pdb/). Following this, each sequence was run through the three

structure predicting services PSIPRED (http://www.psipred.net), PROF king

(http://www.aber.ac.uk/~phiwww/prof/), and PeptideStructure

(http://pmgm2.stanford.edu/gcg-bin/seqweb.cgi).

After all predictions were retrieved, all sequences were examined by hand.

Specifically, analysis was limited to prediction of helices and sheets. The total number of

residues that formed alpha-helices or beta-strands were counted and recorded. For each

predicted protein structure, the number of correctly predicted residues were then counted

and the success of the prediction was expressed as a percent of the total alpha-helices and

beta-sheets.

ResultsOf the three structure prediction services examined, there appeared to be a clear

hierarchy of the accuracy according to my methods of analysis. The first structure

prediction service to be examined was the PROF king service. The results are

summarized in table 1.

Sequence Total ScorableResidues

Total Correct Percent Correct

Ribonuclease Sa 26 12 46.1%Ribonuclease Hi 93 71 76.3%Procathepsin L 139 107 76.0%Hep C RNAPolymerase

303 241 79.5%

Trypsin IV 83 73 87.9%Procaspase 7 110 85 77.3%

Page 5: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectPoliovirus 3CProtease

88 63 71.6%

EIF6 129 100 77.5%Whale Myoglobin 112 103 92.0%GFP 124 93 75%Table 1: Analysis of PROF king protein structure prediction program.

The scores range from a low of 46.1% (from a very small protein) to 92.0% to a

larger, but nearly exclusively helical protein. For the most part, the scores fall in the mid

to upper 70% range, which is fairly accurate.

After the analysis of PROF king, I then went on to analyze the PeptideStructure

service. The results are summarized in Table 2.

Sequence Total ScorableResidues

Total Correct Percent Correct

Ribonuclease Sa 26 5 19.2%Ribonuclease Hi 93 47 50.5%Procathepsin L 139 ND NDHep C RNAPolymerase

303 ND ND

Trypsin IV 83 58 69.9%Procaspase 7 110 65 59.1%Poliovirus 3CProtease

88 37 42.0%

EIF6 129 76 58.9%Whale Myoglobin 112 71 63.4%GFP 124 ND NDTable 2: Analysis of PeptideStructure protein structure prediction program. ND=No Data

For this protein prediction program, the scores were significantly lower than the

scores that were derived for PROF king. A further problem had with this program was

the fact that it was unable to handle 3 proteins because they had unresolved proteins

within their sequence, denoted in the protein sequence as X. Therefore, predictions for

three protein sequences were unable to be obtained. With a low of 19.2% and a high of

only 69.9%, it appears as if this program did not fare well in this rather simple method of

analysis.

Page 6: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectLastly, the PSIPRED homology modeling program was also examined as to

assess its accuracy. The results for this analysis are summarized in table 3.

Sequence Total ScorableResidues

Total Correct Percent Correct

Ribonuclease Sa 26 15 57.7%Ribonuclease Hi 93 79 84.9%Procathepsin L 139 119 85.6%Hep C RNAPolymerase

303 223 73.6%

Trypsin IV 83 78 94.0%Procaspase 7 110 92 83.6%Poliovirus 3CProtease

88 63 71.6%

EIF6 129 109 84.5%Whale Myoglobin 112 103 92%GFP 124 99 79.84%Table 3: Analysis of PSIPRED protein structure prediction program

This protein prediction program actually yielded the best results in this method of

analysis when compared to the other two programs tested. The low yielded from the

analysis of PSIPRED was a high 57.7% with the high being an impressive 94% on a

rather large 224 amino acid sequence.

Comparing the results of all three analyses, it appears that under this form of

analysis, the PSIPRED surpasses both PROF king and PeptideStructure based on the

average of the percent residues correct (Table 4).

MethodAnalyzed

Total ScorableResidues

Total CorrectResidues

Total Percent AveragePercent

PROF King 1207 948 78.5% 75.9%PeptideStructure 641 359 56.0% 51.9%PSIPRED 1207 980 81.2% 81.73%Table 4: Data summary of the three protein structure programs analyzed. Average percent taken as the average percentage of thepercent correct field in tables 1-3.

Page 7: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectDiscussion

From this analysis, there appears not only to be a difference among the three

individual programs tested, but also amongst the two types of analyses presented here (ab

initio vs. homology modeling). From this analysis, the current strength of homology

modeling is brought forward while the fact that ab initio modeling still seems to have

quite a ways to go. One obvious shortcoming of the PeptideStructure method is in the

method itself. By using the Chao-Fasman method of prediction (which is a number of

years old), PeptideStructure will only recognize alpha-helices that are at least 6 residues

long and beta-strands that are at least 5 residues long. A quick look at the actual

secondary structures actually shows a number of residues that are parts of helices or

sheets that are not 5 or 6 residues long and are automatically missed in the

PeptideStructure analysis, but are often picked up by the homology modeling programs.

Another obvious fault with the PeptideStructure protein structure prediction

program is that it is unable to deal with unknown proteins within a given polypeptide

sequence. In many solution structures, there are a couple of unknown proteins, usually

identified by the letter X. Three of the proteins used for analysis contained a certain

number of unknown proteins to see how analysis would be affected. The two homology

modeling programs were able to successfully complete analysis of all 10 proteins

whereas the PeptideStructure program was unable to deal with these sequences, yielding

no data. This is a very obvious and blatant problem with this particular peptide structure

prediction software. It appears as if this inability to deal with the unknown amino acids

would cut out a large portion of sequences that could be analyzed. When novel proteins

are sequenced, often times not all amino acids can be adequately resolved. Therefore,

this program would appear to be useless in the prediction of a number of proteins whose

Page 8: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Projectsequence cannot be completely determined, rendering it nearly useless. However, a

surrogate analysis could be performed by replacing the unknown amino acids with

another, rather innocuous amino acid, like alanine. This may affect the protein structure

predicted in the immediate area of the change, but at least analysis could be performed

and the rest of the protein could be analyzed.

From the data presented, it seems that the PSIPRED program is the most accurate

program when subjected to this kind of analysis. This may not be the most stringent form

of analysis, but it definitely could be useful to know that when you subject one of your

protein sequences to one of these programs for analysis, at a basal level, you can know

that a little better than three out of every four amino acids is correct when you use

PSIPRED or PROF king and barely better than half are actually correct when you use

PeptideStructure. As many groups feel, and as this data show, the ab initio procedures of

protein secondary structure prediction still have a long way to go before they are

accurate.

However, the PSIPRED and PROF king programs are also without their

shortfalls. Obviously, neither one was 100% accurate as would be hoped, but that is a

somewhat unrealistic expectation for theoretical computational molecular biology. Even

though known protein structures were used for this analysis, and these programs all use

PSI-BLAST searches to build their databases from, a 100% accurate protein prediction

was never gained even though the exact protein should have been found in the PSI-

BLAST searches performed by these programs. One would almost think that after

finding exact 100% homology between the query sequence and a sequence that exists in a

database; one could get a perfect protein prediction. This is most likely an artifact of the

Page 9: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Projectalgorithms used for predicting structure. This gave me more of an insight as to how these

programs work which is beneficial in that most of the explanations as to the intricacies of

algorithms and the programs used for them go over my head. Basically, it appears as if

these two programs merely take and average of all the sequences that it finds and this is

the output that you end up seeing. There is obviously no perfect prediction and a

researcher should obviously use caution when protein prediction programs are used.

As one moves into the analysis of tertiary structure, things become more and more

complicated. I would predict that the accuracy would decrease as more complexity is

introduced into what is being asked for. However, as more and more protein structures

get solved, the number of unknown folds decreases dramatically, increasing the power of

homology based modeling not only for the prediction of secondary protein structure, but

also for the prediction of tertiary protein structure. One has to wonder that as our

knowledge of existing proteins grows, will ab initio structure prediction ever catch up to

the homology modeling systems? Also, what accounts for the differences amongst the

homology modeling programs? It seems that the biggest difference between PSIPRED

and PROF king is the fact that PSIPRED uses a neural network in its analysis of

sequences. Is this the main reason that PSIPRED has the highest accuracy rate of over

80% in both categories? PSIPRED has also been shown to receive some of if not the

highest scores in the CASP meetings, with its highest honors garnered last year

(information from PSIPRED website). Perhaps further development of these neural

networks will lead to a continued increase in protein structure prediction like what we

have observed since the inception of theoretical computational molecular biology.

Page 10: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectThough the analysis was simple, it was effective in showing a large difference

between homology modeling and ab initio modeling. The current accuracy of homology

modeling was re-enforced; as was the idea that much more development of ab initio

prediction is needed. A beneficial avenue to take may be to extend (and also complicate)

this analysis not only into fold recognition programs, but also into the analysis of tertiary

structure. At some point in the future, it may be possible to try and extend this analysis

into three dimensions and actually look at how accurate these programs are at placing an

individual amino acid in the correct space in a 3D environment.

Page 11: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectReferences

1. Rost Burkhard and Eyrich Volker A. EVA: Large-Scale Analysis of SecondaryStructure Prediction. Proteins: Structure, Function, and Genetics Suppl 2001; 5: 192-199.

2. Marti-Renom Marc A., Stuart Ashley C., Fiser Andras, Sanchez Roberto, MeloFrancisco, Sali Andrej. Comparative Protein Structure Modeling of Genes andGenomes. Annu. Rev. Biophys. Biomol. Struct. 2000; 29:291-325.

3.Sanchez Roberto, Sali A. Advances in Comparative Protein-Structure Modeling.Curr. Opin. Struct. Biol. 7:206-214.

4.Privelige P., Jr, Fasman G.D. Chou-Fasman Prediction of Secondary Structure, inPrediction of Protein Structure and the Principles of Protein Conformation, ed. G.B.Fasman (ISBN 0-306-43131-9, Plenum, NewYork) (1989) 1-91.

Page 12: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectAppendix A

Procaspase-7 Chain A>gi|18158705SIKTTRDRVPTYQYNMNFEKLGKCIIINNKNFDKVTGMGVRNGTDKDAEALFKCFRSLGFDVIVYNDCSCAKMQDLLKKASEEDHTNAACFACILLSHGEENVIYGKDGVTPIKDLTAHFRGDRCKTLLEKPKLFFIQAARGTELDDGIQADSGPINDTDANPRYKIPVEADFLFAYSTVPGYYSWRSPGRGSWFVQALCSILEEHGKDLEIMQILTRVNDRVARHFESQSDDPHFHEKKQIPCVVSMLTKELYFSQLEHHHHHH

Sperm Whale Myoglobin, Chain A>gi|18655862VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

S. cerevisiae eif6>gi|11513605MATRTQFENSNEIGVFSKLTNTYCLVAVGGSENFYSAFEAELGDAIPIVHTTIAGTRIIGRMTAGNRRGLLVPTQTTDQELQHLRNSLPDSVKIQRVEERLSALGNVICCNDYVALVHPDIDRETEELISDVLGVEVFRQTISGNILVGSYCSLSNQGGLVHPQTSVQDQEELSSLLQVPLVAGTVNRGSSVVGAGMVVNDYLAVTGLDTTAPELSVIESIFRL

Procathepsin L (human)>gi|5822035SLTFDHSLEAQWTKWKAMHNRLYGMNEEGWRRAVWEKNMKMIELHNQEYREGKHSFTMAMNAFGDMTSEEFRQVMNGFQNRKPRKGKVFQEPLFYEAPRSVDWREKGYVTPVKNQGQCGSXWAFSATGALEGQMFRKTGRLISLSEQNLVDCSGPQGNEGCNGGLMDYAFQYVQDNGGLDSEESYPYEATEESCKYNPKYSVANDAGFVDIPKQEKALMKAVATVGPISVAIDAGHESFLFYKEGIYFEPDCSSEDMDHGVLVVGYGFESTESDNNKYWLVKNSWGEEWGMGGYVKMAKDRRNHCGIASAASYPTV

Human Trypsin IV>gi|20149993IVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAVKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSMFCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTIAANS

Hepatitis C RNA pol w/UTP and Mn>gi|20663766SXSYTWTGALITPCAAEESKLPINALSNSLLRHHNXVYATTSRSAGLRQKKVTFDRLQVLDDHYRDVLKEXKAKASTVKAKLLSVEEACKLTPPHSAKSKFGYGAKDVRNLSSKAVNHIHSVWKDLLEDTVTPIDTTIXAKNEVFCVQPEKGGRKPARLIVFPDLGVRVCEKXALYDVVSTLPQVVXGSSYGFQYSPGQRVEFLVNTWKSKKNPXGFSYDTRCFDSTVTENDIRVEESIYQCCDLAPEARQAIKSLTERLYIGGPLTNSKGQNCGYRRCRASGVLTTSCGNTLTCYLKASAACRAAKLQDCTXLVNGDDLVVICESAGTQEDAASLRVFTEAXTRYSAPPGDPPQPEYDLELITSCSSNVSVAHDASGKRVYYLTRDPTTPLARAAWETARHTPVNSWLGNIIXYAPTLWARXILXTHFFSILLAQEQLEKALDCQIYGACYSIEPLDLPQIIERLHGLSAFSLHSYSPGEINRVASCLRKLGVPPLRVWRHRARSVRARLLSQGGRAATCGKYLFNWAVK

Poliovirus 3c proteinase>gi|20664254GPGFDYAVAMAKRNIVTATTSKGEFTMLGVHDNVAILPTHASPGESIVIDGKEVEILDAKALEDQAGTNLEITIITLKRNEKFRDIRPHIPTQITETNDGVLIVNTSKYPNMYVPVGAVTEQGYLNLGGRQTARTLMYNFPTRAGQCGGVITCTGKVIGMHVGGNGSHGFAAALKRSYFTQSQ

Page 13: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project

Streptomyces aureofaciens Ribonuclease Sa Chain A>gi|17942897DVSGTVCLSALPPEATDTLNLIASDGPFPYSQDGVVFQNRESVLPTQSYGYYHEYTVITPGARTRGTRRIITGEATQEDYYTGDHYATFSLIDQTC

E. Coli D10a RNase Hi>gi|20150465MLKQVEIFTAGSALGNPGPGGYGAILRYRGREKTFSAGYTRTTNNRMELMAAIVALEALKEHAEVILSTDSQYVRQGITQWIHNWKKRGWKTADKKPVKNVDLWQRLDAALGQHQIKWEWVKGHAGHPENERADELARAAAMNPTLEDTGYQVEV

Aequorea Victoria GFP>gi|17943298MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFXVQCFSRYPDHMKRHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Page 14: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectAppendix B: Solved Structures

Poliovirus 3C Proteinase Secondary Stx1 GPGFDYAVAM AKRNIVTATT SKGEFTMLGV HDNVAILPTH ASPGESIVID SSHHHHH HHHHEEEEEE TTEEEEEEE EETEEEEEGG G SEEEET

51 GKEVEILDAK ALEDQAGTNL EITIITLKRN EKFRDIRPHI PTQITETNDG TEEEEESEEE E B TTS B EEEEEE S B GGGB SS BSSE

101 VLIVNTSKYP NMYVPVGAVT EQGYLNLGGR QTARTLMYNF PTRAGQCGGV EEEE SSSS SEEEE B EE E B EEETTE EE SEEEEE TTTTT E

151 ITCTGKVIGM HVGGNGSHGF AAALKRSYFT QSQ EEETTEE EE E EEETTEEE EEE GGG

S. cerevesiae eif6 Secondary Stx1 MATRTQFENS NEIGVFSKLT NTYCLVAVGG SENFYSAFEA ELGDAIPIVH EEEE BTTB HHHHEEE SSEEEEETT HHHHHHHHH HHTTTSEEEE51 TTIAGTRIIG RMTAGNRRGL LVPTQTTDQE LQHLRNSLPD SVKIQRVEER

E BTTBS HH HHEEE SSEE EEETT HHH HHHHHHHS T TSEEEEE101 LSALGNVICC NDYVALVHPD IDRETEELIS DVLGVEVFRQ TISGNILVGS SS HHHHEEE SSEEEEETT HHHHHHHH HHHTSEEEEE BTTBS GGG151 YCSLSNQGGL VHPQTSVQDQ EELSSLLQVP LVAGTVNRGS SVVGAGMVVN SEEE SSEEE E TT HHHH HHHHHHHTSE EEE BTTTB S HHHHEEE

201 DYLAVTGLDT TAPELSVIES IFRL SS EEEETT HHHHHHHHH HTTT

Green Fluorescent Protein Solved Structure

1 MSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GGGGGSS EEEEEEEEEE EETTEEEEEE EEEEEEGGGT EEEEEEEETT51 GKLPVPWPTL VTTFXVQCFS RYPDHMKRHD FFKSAMPEGY VQERTIFFKD SS SS HHHH TTTS GGGS B GGGGGG HHHHTTTT E EEEEEEEETT101 DGNYKTRAEV KFEGDTLVNR IELKGIDFKE DGNILGHKLE YNYNSHNVYI S EEEEEEEE EEETTEEEEE EEEEEES T TSTTTTT B S EEEEE151 MADKQKNGIK VNFKIRHNIE DGSVQLADHY QQNTPIGDGP VLLPDNHYLS EEEGGGTEEE EEEEEEEEET TS EEEEEEE EEEEESS S SEEEE

201 TQSALSKDPN EKRDHMVLLE FVTAAGITHG MDELYK EEEEEE TT SSEEEEEE EEEEES

Hepatitis C RNA Polymerase Secondary Stx1 SXSYTWTGAL ITPCAAEESK LPINALSNSL LRHHNXVYAT TSRSAGLRQK B EEE S SS B HHHHHH S GGGEEE GGGHHHHHH

51 KVTFDRLQVL DDHYRDVLKE XKAKASTVKA KLLSVEEACK LTPPHSAKSK HT B HHHHHHHHH HHHHHTT B HHHHHH TTTTTS T101 FGYGAKDVRN LSSKAVNHIH SVWKDLLEDT VTPIDTTIXA KNEVFCVQPE TS HHHHHT T HHHHHHHH HHHHHHHH S SS EEEEE EEE BTT151 KGGRKPARLI VFPDLGVRVC EKXALYDVVS TLPQVVXGSS YGFQYSPGQR TTBS EE EEE HHHHHH HHHHHTTTTT THHHHHHGGG BGGG HHHH201 VEFLVNTWKS KKNPXGFSYD TRCFDSTVTE NDIRVEESIY QCCDLAPEAR HHHHHHHHHH TTTEEEEEEE ETTHHHHTTH HHHHHHHHHH TTSB HHHH251 QAIKSLTERL YIGGPLTNSK GQNCGYRRCR ASGVLTTSCG NTLTCYLKAS HHHHHHHHHT TS EEEE TT S B EEE S TTTTTHHHH HHHHHHHHHH301 AACRAAKLQD CTXLVNGDDL VVICESAGTQ EDAASLRVFT EAXTRYSAPP HHHHHTT SS EEEEETTEE EEEEE HH HHHHHHHHHH HHHHHTT B

Page 15: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project351 GDPPQPEYDL ELITSCSSNV SVAHDASGKR VYYLTRDPTT PLARAAWETA SS BS G GG BTTEEE EEEE TT E EEEEE HHH HHHHHHHHHH401 RHTPVNSWLG NIIXYAPTLW ARXILXTHFF SILLAQEQLE KALDCQIYGA S S HHHH HHHHTTTSHH HHHTHHHHHH HHHHHTTTTT EEEEETTE451 CYSIEPLDLP QIIERLHGLS AFSLHSYSPG EINRVASCLR KLGVPPLRVW EEEE GGGHH HHHHHHH GG GGT B HH HHHHHHHHHH HHT HHHH

501 RHRARSVRAR LLSQGGRAAT CGKYLFNWAV K HHHHHHHHHH HHTTTHHHHH HHHHHTGGG

Sperm Whale Met-Myoglobin Secondary Structure1 VLSEGEWQLV LHVWAKVEAD VAGHGQDILI RLFKSHPETL EKFDRFKHLK HHHHHHH HHHHHHHTTS HHHHHHHHHH HHHHHTHHHH TTTTTTTT51 TEAEMKASED LKKHGVTVLT ALGAILKKKG HHEAELKPLA QSHATKHKIP SHHHHHHTHH HHHHHHHHHH HHHHHHTTTT TTHHHHTHHH HHHHHTS101 IKYLEFISEA IIHVLHSRHP GDFGADAQGA MNKALELFRK DIAAKYKELG HHHHHHHHHH HHHHHHHHTT TTTTHHHHHH HHHHHHHHHH HHHHHHHHHT151 YQG

Human Procaspase-7 Secondary Stx1 SIKTTRDRVP TYQYNMNFEK LGKCIIINNK NFDKVTGMGV RNGTDKDAEA S SSB SSB EEEEEEE GGGT TTHHHHHHH51 LFKCFRSLGF DVIVYNDCSC AKMQDLLKKA SEEDHTNAAC FACILLSHGE HHHHHHHHTE EEEEEES H HHHHHHHHHH HHS TTBSE EEEEEEEEE101 ENVIYGKDGV TPIKDLTAHF RGDRCKTLLE KPKLFFIQAA RGTELDDGIQ SSEEE SSSE EEHHHHHHTT TTTTTGGGTT SEEEEEEEEE S S151 ADSGPINDTD ANPRYKIPVE ADFLFAYSTV PGYYSWRSPG RGSWFVQALC S TTEEEEEE SS HHHHHHH201 SILEEHGKDL EIMQILTRVN DRVARHFESQ SDDPHFHEKK QIPCVVSMLT HHHHHHTTTT THHHHHHHHH HHHHHTTTS SSTT EEEE S251 KELYFSQLEH HHHHH SB S SS

Procathepsin L Secondary Stx1 SLTFDHSLEA QWTKWKAMHN RLYGMNEEGW RRAVWEKNMK MIELHNQEYR GGGHH HHHHHHHHTT TTHHHH HHHHHHHHHH HHHHHHHHHH51 EGKHSFTMAM NAFGDMTSEE FRQVMNGFQN RKPRKGKVFQ EPLFYEAPRS TT SEEE TTTT HHH HHHHT B S EE TT S101 VDWREKGYVT PVKNQGQCGS XWAFSATGAL EGQMFRKTGR LISLSEQNLV EEGGGGT B SSS HHHHHHHHHH HHHHHHHHS B HHHHH151 DCSGPQGNEG CNGGLMDYAF QYVQDNGGLD SEESYPYEAT EESCKYNPKY HHGGGGT G GG B HHHHH HHHHHHT EE ETTTS SS GGG201 SVANDAGFVD IPKQEKALMK AVATVGPISV AIDAGHESFL FYKEGIYFEP B B EEE S HHHHHH HHHHH EEE EE SHHHH TB EEEE T251 DCSSEDMDHG VLVVGYGFES TESDNNKYWL VKNSWGEEWG MGGYVKMAKD T SS EE EEE EEEEE SS EEEEE EE SB TTST BTTEEEEE S301 RRNHCGIASA ASYPTV SSSGGGTTTS EEE

Page 16: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectStreptomyces aureofaciens Ribonuclease Sa Secondary Stx

1 DVSGTVCLSA LPPEATDTLN LIASDGPFPY SQDGVVFQNR ESVLPTQSYG EEETTT TTTTHHHHHH HHHTT SS S SEE T TS SS SSS51 YYHEYTVITP GARTRGTRRI ITGEATQEDY YTGDHYATFS LIDQTC EEE T T SSS S E EEESSSS E E SSSSS EEES

E. Coli D10a RNase Hi Secondary Structure1 MLKQVEIFTA GSALGNPGPG GYGAILRYRG REKTFSAGYT RTTNNRMELM EEEEEE EEESSTTB E EEEEEEEETT EEEEEEEEES SB HHHHHHH51 AAIVALEALK EHAEVILSTD SQYVRQGITQ WIHNWKKRGW KTADKKPVKN HHHHHHHT S EEEEEE HHHHHHHHT THHHHHHTTS B TTS B TT101 VDLWQRLDAA LGQHQIKWEW VKGHAGHPEN ERADELARAA AMNPTLEDTG HHHHHHHHHH HTT EEEEEE SSTT HHH HHHHHHHHHH HHS B TT

151 YQVEV

Human Trypsin IV Secondary Stx1 IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH CYKTRIQVRL B EE TT TTTTEEEEES SSB EEEEE BTTEEEE GG G SS EEEE51 GEHNIKVLEG NEQFINAVKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV S SBTTS S EEEEESEE EE TT BTTT TBT EEEEE SS SSSS101 STISLPTAPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA EE SS TT EEEEEE S B SSS B SB EEEEE EE HHHHHH151 SYPGKITNSM FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK HTTTT TTE EEES TT S B TTTTT E EEETTEE EE E B SSSS T

201 NRPGVYTKVY NYVDWIKDTI AANS T EEEEEGG GGHHHHHHHH HTTT

Page 17: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final ProjectAppendix C: Seqweb Results

S. cerevisiae eIF6SeqWeb PredictionPos AA CF-Pred GORPred AI-Ind..

1 M . B 0.6002 A . B 0.6003 T . B 0.9004 R . B 0.6005 T . B 0.9006 Q . . 0.9007 F . . 0.9008 E . . 0.9009 N . . 0.90010 S T . 1.30011 N T . 1.15012 E . . 0.45013 I . T -0.05014 G . T -0.20015 V B . -0.60016 F B . -0.60017 S B . -0.60018 K B T 0.40019 L B T 1.00020 T B T 1.30021 N B T 1.15022 T B T -0.05023 Y B B -0.60024 C B B -0.60025 L B B -0.60026 V B B -0.60027 A B B -0.60028 V B B -0.60029 G t . -0.25030 G t . 0.65031 S t . 0.65032 E t . 1.10033 N t . 1.10034 F t . 0.95035 Y t . 0.50036 S H . 0.30037 A H H -0.60038 F H H 0.30039 E H H -0.30040 A H H -0.30041 E H H 0.30042 L H H 0.30043 G H H -0.45044 D H H -0.30045 A H H -0.60046 I H H -0.60047 P . B -0.60048 I B B -0.600

49 V B B -0.60050 H B B -0.60051 T B B -0.60052 T B B -0.60053 I B . -0.30054 A B . 0.30055 G B . -0.45056 T B B -0.60057 R B B -0.60058 I B B 0.30059 I B B -0.30060 G . B -0.30061 R . B -0.60062 M . B -0.30063 T . . 0.75064 A t . 1.10065 G t . 1.10066 N t T 1.50067 R . T 1.30068 R B T 1.15069 G B T 0.85070 L B B -0.30071 L B B -0.60072 V B B -0.45073 P . B -0.45074 T . B 0.00075 Q . . 0.90076 T . . 0.90077 T T . 1.30078 D T . 1.30079 Q H B 0.90080 E H B 0.90081 L H B 0.75082 Q H B 0.75083 H H . 0.75084 L H . 0.75085 R t T 1.35086 N t T 1.50087 S . . 0.90088 L . . 0.90089 P T . 0.85090 D T . 0.85091 S h . -0.15092 V h B 0.75093 K h B 0.90094 I h B -0.15095 Q h B 0.60096 R h B 0.90097 V h B 0.90098 E h B 0.90099 E h B 0.900100 R h B 0.600101 L h T 0.700

Page 18: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project102 S h T 0.100103 A h T 0.100104 L h T -0.200105 G . T -0.200106 N . T -0.200107 V B B -0.600108 I B B -0.600109 C B B -0.600110 C B B -0.600111 N B T -0.200112 D . T 0.100113 Y B B -0.600114 V B B -0.600115 A B B -0.600116 L B B -0.600117 V B B -0.600118 H B B -0.600119 P T . 0.250120 D T . 1.300121 I h . 0.900122 D h H 0.900123 R h H 0.900124 E h H 0.900125 T h H 0.900126 E h H 0.750127 E h H 0.750128 L h H 0.750129 I h H -0.300130 S h H -0.600131 D h H -0.600132 V h H -0.600133 L h H -0.600134 G . H -0.600135 V h H -0.600136 E h B -0.600137 V h B 0.300138 F h B 0.450139 R h B 0.450140 Q h B -0.150141 T h . 0.600142 I h . 0.750143 S t . 0.050144 G t T 0.150145 N t T 0.000146 I . B -0.600147 L . B -0.600148 V . B -0.600149 G . B -0.600150 S T B -0.200151 Y T B -0.200152 C t T 0.300153 S t T 0.900154 L t T 1.200155 S T T 1.250156 N T T 1.550157 Q T T 1.250

158 G T T 1.250159 G t T 1.050160 L . . 0.450161 V h . 0.300162 H h . 0.450163 P h . 0.600164 Q h . 0.600165 T h . 0.900166 S h . 0.900167 V h . 0.900168 Q h . 0.900169 D h H 0.900170 Q h H 0.900171 E h H 0.900172 E h H 0.900173 L h H 0.750174 S . H -0.150175 S . H -0.600176 L B H -0.600177 L B H -0.600178 Q B B -0.600179 V B B -0.600180 P B B -0.600181 L B B -0.600182 V B B -0.600183 A B B -0.600184 G B B -0.600185 T B B -0.150186 V B B 0.750187 N t . 1.100188 R T T 1.550189 G T T 1.250190 S t . 0.650191 S t . 0.050192 V . B -0.450193 V . B -0.600194 G . B -0.600195 A t B -0.400196 G t B -0.400197 M B B -0.600198 V B B -0.600199 V B B -0.300200 N B B -0.600201 D B B -0.600202 Y B B -0.600203 L B B -0.300204 A B B -0.600205 V B B -0.600206 T B B -0.600207 G . B -0.450208 L . . -0.150209 D . . 0.450210 T . . 0.600211 T h . 0.900212 A h . 0.900213 P h H 0.450

Page 19: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project214 E h H -0.450215 L h H -0.600216 S h H -0.600217 V h H -0.600218 I h H -0.600219 E h H -0.600

220 S h H -0.600221 I h H -0.450222 F h H -0.150223 R h H -0.450224 L h H -0.450

Whale MyoglobinSeqWeb Prediction

Pos AA CF-Pred GORPred AI-Ind ..

1 V h . -0.4502 L h . -0.4503 S h . 0.4504 E h H 0.4505 G h H 0.9006 E h H 0.9007 W B H 0.6008 Q B H -0.6009 L B H -0.30010 V B H -0.60011 L B H -0.60012 H B H -0.60013 V B H -0.60014 W H H -0.60015 A H H 0.30016 K H H -0.60017 V H H 0.75018 E H H -0.30019 A H H -0.30020 D H H -0.60021 V H H 0.30022 A H H -0.30023 G . H 0.60024 H t H 0.95025 G t . 0.95026 Q t . 0.65027 D t B 0.05028 I B B -0.30029 L B B -0.60030 I B B -0.60031 R B B -0.60032 L B B -0.60033 F B . 0.45034 K . . 0.75035 S . . 0.90036 H . . 0.90037 P T . 1.30038 E T H 1.30039 T H H 0.90040 L H H 0.90041 E H H 0.90042 K H H 0.900

43 F H H 0.90044 D H H 0.90045 R H H 0.90046 F H H 0.75047 K H H 0.90048 H H H 0.75049 L H H 0.90050 K H H 0.90051 T H H 0.90052 E H H 0.90053 A H H 0.90054 E H H 0.75055 M H H 0.90056 K H H 0.90057 A H H 0.90058 S H H 0.90059 E H H 0.90060 D H H 0.90061 L H H 0.90062 K H H 0.90063 K H H 0.75064 H H H 0.75065 G . H 0.60066 V B B -0.60067 T B B -0.60068 V B B -0.60069 L B B -0.60070 T B H -0.60071 A H H -0.60072 L H H -0.60073 G H H -0.60074 A H H -0.60075 I H H -0.60076 L H H 0.45077 K H H 0.60078 K H H 0.90079 K H H 0.90080 G H H 0.90081 H H H 0.75082 H H H 0.75083 E H H 0.75084 A H H 0.75085 E H H 0.75086 L H H 0.45087 K H H -0.150

Page 20: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project88 P h H 0.45089 L h H 0.60090 A h H 0.75091 Q h H 0.30092 S h H 0.45093 H h H 0.75094 A h H 0.90095 T h H 0.90096 K h H 0.90097 H h H 0.90098 K h . 0.75099 I h B 0.750100 P . B 0.750101 I . B -0.300102 K . B -0.150103 Y . B -0.300104 L H H -0.600105 E H H -0.300106 F H H -0.300107 I H H -0.600108 S H H -0.600109 E H H -0.600110 A B H -0.600111 I B H -0.600112 I B H -0.600113 H B H -0.600114 V B H -0.600115 L B H -0.300116 H B . 0.600117 S t . 0.950118 R t . 1.100119 H t . 1.100120 P T T 1.700

121 G T T 1.550122 D t T 1.350123 F t . 0.950124 G . . 0.300125 A H H 0.600126 D H H 0.300127 A H H 0.450128 Q H H 0.450129 G H H 0.600130 A H H 0.750131 M H H 0.600132 N H H -0.300133 K H H 0.450134 A H H -0.300135 L H H -0.300136 E H H -0.300137 L H H -0.300138 F H H 0.750139 R H H 0.750140 K H H -0.150141 D H H 0.600142 I H H 0.750143 A H H 0.600144 A H H 0.750145 K H H 0.750146 Y H H 0.900147 K H H 0.900148 E H H 0.900149 L H T 1.150150 G . T 1.150151 Y . T 1.150152 Q . T 0.850153 G . . 0.900

Polio 3c ProteinaseSeqWeb PredictionPos AA CF-Pred GORPred AI-Ind ..

1 G . . -0.1502 P T . 1.1503 G T . 1.1504 F B . 0.4505 D B H -0.3006 Y B H -0.6007 A B H -0.6008 V B H -0.6009 A . H -0.60010 M . H -0.60011 A . H 0.45012 K . H 0.45013 R . H -0.30014 N . H 0.45015 I . B 0.30016 V . B -0.30017 T . B -0.60018 A . B -0.450

19 T h . 0.60020 T t . 1.10021 S t . 1.10022 K T . 1.30023 G T . 1.30024 E . . 0.75025 F B . -0.30026 T B . -0.60027 M B . -0.60028 L B . -0.60029 G B . -0.30030 V B . 0.30031 H B . -0.30032 D h . 0.30033 N h B -0.60034 V h B -0.60035 A h B -0.60036 I h B -0.600

Page 21: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project37 L h B -0.60038 P . B -0.60039 T . . -0.45040 H . . 0.60041 A t . 0.95042 S t . 1.10043 P T . 1.30044 G T T 1.25045 E . T 0.25046 S . H -0.45047 I . H -0.60048 V . H -0.60049 I . H -0.60050 D t H 0.05051 G t H 0.05052 K H H 0.90053 E H H 0.75054 V H H -0.30055 E H H 0.30056 I H H -0.60057 L H H -0.60058 D H H -0.30059 A H H -0.60060 K H H -0.15061 A H H 0.75062 L H H 0.90063 E H H 0.75064 D H H 0.75065 Q H H 0.90066 A H H 0.90067 G . . 0.75068 T t . 0.95069 N t . 0.05070 L . . 0.30071 E . . -0.60072 I B . -0.60073 T B . -0.60074 I B . -0.60075 I B . -0.60076 T B . -0.60077 L B . -0.15078 K h . 0.90079 R h T 1.30080 N h T 1.30081 E h T 1.30082 K h T 1.30083 F h T 1.30084 R h T 1.30085 D h T 1.30086 I h T 1.30087 R h . 0.75088 P . . 0.75089 H . . 0.60090 I . . 0.90091 P . . 0.45092 T . . 0.450

93 Q . . 0.60094 I . . 0.90095 T . . 0.90096 E . . 0.90097 T t T 1.50098 N T T 1.70099 D T T 1.550100 G T T 0.350101 V B B -0.600102 L B B -0.600103 I B B -0.600104 V B B -0.600105 N B B -0.450106 T B T 1.000107 S t T 1.500108 K t T 1.500109 Y t . 1.100110 P T T 1.700111 N T T 1.550112 M B B 0.450113 Y B B -0.300114 V B B -0.600115 P B B -0.600116 V B B -0.600117 G B B -0.600118 A B B -0.600119 V B B -0.300120 T B B 0.450121 E . B 0.600122 Q t B 0.800123 G t B 1.100124 Y . B 0.600125 L . . 0.300126 N . . -0.300127 L t . 0.500128 G t . 1.100129 G t . 0.950130 R t . 1.100131 Q b . 0.900132 T b B 0.900133 A b B 0.900134 R b B 0.450135 T b B -0.300136 L b B 0.450137 M b B 0.300138 Y b B -0.300139 N b . -0.150140 F b . 0.750141 P . T 1.300142 T . T 1.150143 R . T 1.300144 A t T 1.350145 G t T 1.350146 Q T T 1.550147 C T T 0.350148 G t T 0.150

Page 22: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project149 G t B -0.400150 V B B -0.600151 I B B -0.600152 T B B -0.600153 C B B -0.600154 T B B -0.450155 G B T -0.050156 K B T -0.050157 V B B -0.600158 I B B -0.300159 G B B -0.600160 M B B -0.600161 H B B -0.600162 V B . 0.300163 G t . 0.650164 G T . 1.150165 N T . 1.150166 G T . 1.150

167 S T . 1.150168 H T . 1.000169 G . . -0.300170 F H H -0.600171 A H H -0.600172 A H H -0.600173 A H H -0.600174 L H H 0.450175 K H H 0.600176 R t T 1.200177 S t T 1.500178 Y B T 1.300179 F B T 1.300180 T B T 1.300181 Q B T 1.300182 S B . 0.900183 Q . . 0.900

Procaspase-7SeqWeb PredictionPos AA CF-Pred GORPred AI-Ind

1 S . . 0.6002 I . T 1.3004 T . T 1.3005 T . T 1.3006 R t T 1.5007 D t T 1.5008 R B T 1.3009 V B . 0.90010 P B . 0.90011 T B . 0.90012 Y B T 1.15013 Q B T 1.15014 Y B T 1.15015 N . . 0.75016 M H . 0.75017 N H H 0.75018 F H H 0.60019 E H H 0.30020 K H H 0.90021 L H H 0.45022 G . H 0.45023 K . H -0.60024 C B B -0.60025 I B B -0.60026 I B B -0.60027 I B B -0.60028 N B B 0.30029 N t T 1.20030 K T T 1.70031 N T T 1.70032 F t T 1.50033 D t T 1.500

34 K . B 0.75035 V . B -0.15036 T . B 0.45037 G . B -0.45038 M . B -0.60039 G . B 0.30040 V . . 0.45041 R T . 0.85042 N T . 1.30043 G t . 1.10044 T H . 0.90045 D H . 0.90046 K H H 0.90047 D H H 0.90048 A H H 0.90049 E H H 0.30050 A b H 0.30051 L b H -0.60052 F b H -0.60053 K b H -0.60054 C b H -0.30055 F b H -0.30056 R b T 0.70057 S B T -0.20058 L B T 0.10059 G B T 0.10060 F B T -0.20061 D B B -0.60062 V B B -0.60063 I B B -0.60064 V B B -0.30065 Y B T -0.200

Page 23: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project66 N T T 0.50067 D T T 0.50068 C T T 1.10069 S t T 1.20070 C t H -0.10071 A H H -0.30072 K H H 0.60073 M H H 0.45074 Q H H -0.15075 D H H 0.75076 L H H 0.90077 L H H 0.90078 K H H 0.45079 K H H 0.60080 A H H 0.90081 S H H 0.90082 E H H 0.90083 E H H 0.90084 D H H 0.90085 H H H 0.90086 T T H 1.30087 N T H 1.00088 A B H -0.30089 A B H -0.60090 C B H -0.60091 F B H -0.60092 A B H -0.60093 C B H -0.60094 I B H -0.60095 L B H -0.60096 L B H -0.60097 S t . -0.40098 H t . 0.95099 G . . 0.900100 E . . 0.900101 E . . 0.750102 N B T 0.700103 V B T 1.000104 I B T 1.000105 Y B T 1.000106 G B T 0.850107 K T T 1.250108 D T T 1.550109 G . T 1.300110 V . B 0.450111 T . B 0.450112 P h B 0.450113 I h B -0.150114 K h B 0.450115 D h B -0.150116 L h . 0.450117 T h . 0.300118 A h . 0.600119 H h T 0.700120 F h T 1.000121 R t T 1.500

122 G t T 1.350123 D t T 1.500124 R t T 1.500125 C b T 1.150126 K b H 0.450127 T b H 0.450128 L b H 0.600129 L b H 0.900130 E H H 0.900131 K H H 0.600132 P H H 0.600133 K H H 0.450134 L H H -0.600135 F H H -0.600136 F H H -0.600137 I H H -0.600138 Q H H -0.600139 A H H -0.300140 A H . 0.450141 R t . 1.100142 G t . 0.650143 T h . 0.900144 E h . 0.900145 L h . 0.900146 D h T 0.850147 D h T 1.150148 G h T 0.850149 I h . 0.750150 Q h . 0.750151 A h . 0.450152 D h . 0.600153 S T . 0.850154 G T . 0.850155 P . . 0.750156 I . . 0.750157 N . . 0.900158 D t T 1.500159 T T T 1.700160 D T T 1.700161 A t . 1.100162 N t . 1.100163 P T T 1.700164 R T T 1.700165 Y t T 1.500166 K h B 0.900167 I h B 0.750168 P h B -0.300169 V h B 0.300170 E h H -0.600171 A h H -0.600172 D h H -0.600173 F B H -0.600174 L B H -0.600175 F B H -0.600176 A B H -0.600177 Y B T -0.200

Page 24: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project178 S . T -0.200179 T . . -0.150180 V . . 0.450181 P T T 1.250182 G T T 1.250183 Y t T 1.050184 Y t T 1.350185 S . T 1.150186 W . T 1.150187 R . . 0.900188 S . . 0.900189 P T T 1.700190 G T T 1.700191 R T T 1.550192 G T T 1.550193 S t T 1.050194 W h B 0.300195 F h B -0.600196 V h B -0.600197 Q h B -0.600198 A h . -0.600199 L h . -0.600200 C h . -0.600201 S h H -0.600202 I h H -0.600203 L h H 0.300204 E h H 0.300205 E h H 0.900206 H h H 0.900207 G . H 0.900208 K t H 1.100209 D t H 0.950210 L . H 0.300211 E . H 0.600212 I B H -0.600213 M B H -0.600214 Q B H -0.600215 I B H -0.600216 L B H -0.600217 T B . -0.300218 R B . 0.000219 V B . 0.900220 N t T 1.500221 D t T 1.350

222 R h . 0.750223 V h . 0.750224 A h . 0.600225 R h . 0.600226 H h . 0.450227 F h T 1.150228 E h T 1.300229 S h T 1.300230 Q t T 1.500231 S t T 1.500232 D T . 1.300233 D T . 1.300234 P h . 0.900235 H h . 0.750236 F h T 1.150237 H h T 1.150238 E h T 1.300239 K h T 1.300240 K h T 1.300241 Q B T 1.150242 I B . 0.300243 P B B -0.600244 C B B -0.600245 V B B -0.600246 V B B -0.600247 S B B -0.600248 M H B -0.600249 L H H -0.150250 T H H -0.150251 K H H 0.000252 E H H -0.150253 L H H 0.750254 Y . H 0.750255 F h H -0.300256 S h H -0.150257 Q h H 0.600258 L h H 0.750259 E h H 0.750260 H h H 0.750261 H h H 0.750262 H h H 0.900263 H h H 0.900264 H h . 0.900265 H h . 0.900

Human Trypsin IVSeqWeb PredictionPos AA CF-Pred GORPred AI-Ind ..

1 I B T -0.0502 V B T -0.0503 G B . -0.4504 G B . -0.4505 Y B . -0.3006 T B . 0.6007 C B T 1.300

8 E . T 1.3009 E . T 1.15010 N . T 1.30011 S . . 0.90012 L . . 0.75013 P . T 0.70014 Y B T 0.700

Page 25: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project15 Q B B -0.60016 V B B 0.30017 S B B 0.30018 L B B 0.30019 N t T 0.45020 S T T 1.55021 G T T 1.25022 S t T 1.05023 H t T 0.90024 F t T 0.30025 C t T 0.90026 G T T 0.35027 G T T 0.35028 S t . -0.25029 L . . -0.15030 I . . 0.45031 S . . 0.45032 E . H -0.45033 Q B H -0.60034 W B H 0.30035 V B H -0.30036 V B H -0.60037 S B H -0.60038 A B H -0.60039 A B H -0.60040 H B T 0.70041 C B T 0.70042 Y B T 1.15043 K B T 1.15044 T B T 1.30045 R B B 0.90046 I B B 0.75047 Q B B 0.30048 V B B 0.30049 R B B -0.30050 L B . 0.60051 G . . 0.75052 E H . 0.60053 H H . 0.75054 N H . 0.60055 I H . 0.30056 K H . 0.30057 V H . -0.30058 L H . -0.15059 E H H 0.75060 G . H 0.90061 N . H 0.90062 E . H 0.75063 Q B H 0.90064 F B H 0.60065 I B H -0.60066 N B H -0.30067 A B H -0.60068 V B H -0.60069 K B H -0.60070 I B B -0.600

71 I B B 0.30072 R B B 0.60073 H B B 0.75074 P T T 1.70075 K T T 1.70076 Y t T 1.50077 N . T 1.30078 R T T 1.70079 D T T 1.70080 T . T 1.30081 L . . 0.90082 D t T 1.35083 N t T 1.05084 D t T 0.15085 I B . -0.60086 M B . -0.60087 L B . -0.60088 I B . -0.60089 K B . -0.60090 L B . -0.45091 S . . -0.30092 S . . -0.15093 P . . -0.45094 A B . -0.60095 V B B -0.60096 I B B -0.30097 N B B -0.60098 A B B -0.60099 R B B -0.150100 V B B -0.300101 S B B -0.450102 T B B -0.450103 I B B -0.450104 S . B -0.600105 L . . -0.450106 P . . -0.450107 T . . 0.600108 A . . 0.000109 P . . 0.450110 P . . -0.150111 A . . -0.150112 A . . 0.750113 G . B 0.450114 T t B -0.250115 E t B -0.400116 C . B -0.600117 L . B -0.600118 I . B -0.600119 S . . -0.600120 G t T 0.300121 W t T 1.050122 G . . 0.450123 N b . 0.450124 T b . -0.150125 L b . -0.300126 S b T -0.200

Page 26: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project127 F b T -0.200128 G . T 0.100129 A T T 1.100130 D T . 1.150131 Y . . 0.900132 P T . 1.300133 D T H 1.300134 E H H 0.750135 L H H 0.450136 K H H 0.600137 C H H -0.300138 L H H -0.600139 D H H -0.600140 A H H -0.600141 P h H -0.600142 V h H -0.300143 L h H -0.600144 T h H -0.300145 Q h H -0.450146 A h H 0.900147 E h H 0.600148 C h H 0.600149 K h T 0.700150 A h T 1.000151 S h . 0.900152 Y . . 0.900153 P T T 1.400154 G T T 1.700155 K . T 1.300156 I . T 1.150157 T . B 0.450158 N . B -0.150159 S . B -0.600160 M B B -0.600161 F B B -0.600162 C B B -0.600163 V B B -0.600164 G B B -0.600165 F B B -0.600166 L B . -0.600167 E t T 1.050168 G t T 1.500169 G t T 1.500170 K T T 1.550171 D T T 1.550172 S T T 1.700173 C T T 1.700174 Q t T 1.500175 R t T 1.350

176 D t T 1.500177 S T T 1.700178 G T T 1.550179 G t . 0.050180 P . . -0.450181 V . B -0.600182 V . B -0.600183 C t B -0.100184 N T B -0.200185 G T B 0.850186 Q B B 0.750187 L B B 0.450188 Q B B -0.450189 G B B -0.600190 V B B -0.600191 V B B -0.300192 S . T 0.100193 W . T 0.100194 G t T 0.300195 H t T 0.900196 G t T 0.900197 C t T 1.200198 A . T 1.000199 W . T 1.000200 K t T 1.500201 N t T 1.500202 R t . 1.100203 P T T 1.700204 G T T 1.700205 V B T 1.150206 Y B B -0.300207 T B B -0.300208 K B B 0.450209 V B B 0.750210 Y B B 0.300211 N B B 0.600212 Y B B 0.450213 V h B 0.300214 D h B 0.600215 W h B 0.600216 I h B 0.750217 K h B 0.300218 D h B -0.300219 T h B -0.600220 I h . 0.600221 A h . 0.450222 A h . -0.450223 N . . -0.450224 S . . 0.600

Ribonuclease SaSeqWeb PredictionPos AA CF-Pred GORPred AI-Ind ..

1 D . T 0.8502 V t T 1.050

3 S t T 0.1504 G t B -0.250

Page 27: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project5 T . B -0.4506 V h B -0.6007 C h B -0.6008 L h B -0.6009 S h B -0.60010 A h . -0.60011 L h . -0.15012 P . . 0.45013 P t . 0.20014 E t H 1.10015 A . H 0.90016 T . H 0.90017 D . H 0.90018 T B H -0.15019 L B H -0.60020 N B B -0.60021 L B B -0.60022 I B B -0.60023 A B B -0.30024 S T T 0.35025 D T T 0.35026 G T . 1.00027 P t . 1.10028 F t . 0.95029 P T . 1.15030 Y T T 1.70031 S t T 1.50032 Q T T 1.70033 D T T 1.25034 G t T 0.15035 V B B -0.30036 V B B -0.30037 F B B 0.30038 Q B B 0.90039 N B . 0.90040 R t T 1.50041 E t T 1.50042 S . T 1.15043 V B . 0.45044 L B . 0.45045 P B . -0.15046 T B T 0.40047 Q B T 1.30048 S B T 1.30049 Y B T 1.15050 G . T 1.150

51 Y B T 1.15052 Y B T 1.15053 H B B 0.75054 E B B 0.75055 Y B B 0.30056 T B B 0.30057 V B B -0.30058 I B B -0.60059 T B B -0.60060 P T . -0.05061 G T . 1.00062 A . . 0.90063 R . . 0.90064 T t . 1.10065 R t T 1.50066 G t T 1.50067 T B T 1.30068 R B B 0.75069 R B B 0.45070 I B B 0.30071 I B B 0.60072 T B . -0.15073 G . . -0.45074 E H . 0.60075 A H . 0.90076 T H . 0.90077 Q H . 0.90078 E H T 1.30079 D H T 1.30080 Y t T 1.50081 Y t T 1.50082 T t T 1.50083 G t T 1.50084 D t T 1.50085 H t . 0.95086 Y B . 0.75087 A B B 0.60088 T B B -0.30089 F B B -0.60090 S B B -0.60091 L B T 0.10092 I B T 0.10093 D . T 0.25094 Q t T 0.15095 T t T 1.05096 C . T 1.150

Ribonuclease HiSeqWeb PredictionPos AA CF-Pred GORPred AI-Ind ..

1 M H B 0.6002 L H B -0.1503 K H B 0.4504 Q H B -0.1505 V H B -0.300

6 E H B -0.3007 I H B -0.6008 F H B -0.6009 T H B -0.30010 A H . -0.600

Page 28: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project11 G . . -0.45012 S . . -0.45013 A . . -0.15014 L . . -0.15015 G . . -0.15016 N . . -0.15017 P T . 0.25018 G T . 0.85019 P T . 0.25020 G T . 0.25021 G T . -0.05022 Y T B -0.20023 G . B -0.60024 A B B -0.60025 I B B -0.60026 L B B -0.30027 R B B -0.30028 Y B B 0.45029 R t T 1.50030 G t T 1.50031 R h T 1.30032 E h T 1.30033 K h T 1.30034 T h . 0.90035 F h . 0.30036 S h . -0.30037 A h . -0.60038 G . . -0.15039 Y . . -0.15040 T . . 0.00041 R . . 0.00042 T . . 0.00043 T T . 1.30044 N T . 1.30045 N t . 1.10046 R H H 0.60047 M H H 0.45048 E H H 0.30049 L H H 0.30050 M H H -0.60051 A H H -0.60052 A H H -0.60053 I H H -0.60054 V H H -0.60055 A H H -0.60056 L H H -0.60057 E H H -0.30058 A H H 0.60059 L H H 0.75060 K H H 0.60061 E H H 0.75062 H H H 0.75063 A H H 0.60064 E H H -0.30065 V B H -0.60066 I B H -0.300

67 L B H 0.30068 S B . -0.15069 T B T 0.40070 D t T 0.60071 S t T 0.60072 Q B T 1.00073 Y B T 1.00074 V B . 0.00075 R B T 0.25076 Q B T -0.05077 G B T 0.40078 I B . -0.15079 T B B -0.45080 Q B B -0.60081 W B B -0.60082 I B B -0.60083 H B . -0.60084 N B T -0.05085 W . T 0.85086 K . . 0.90087 K t T 1.20088 R T T 1.70089 G T T 1.70090 W h T 1.30091 K h . 0.90092 T h . 0.90093 A h T 1.30094 D h T 1.30095 K h . 0.90096 K h . 0.90097 P h . 0.90098 V h . 0.90099 K h B 0.900100 N h B 0.450101 V H B -0.300102 D H B -0.300103 L H H -0.300104 W H H -0.300105 Q H H 0.300106 R H H -0.150107 L H H -0.150108 D H H 0.300109 A H H 0.300110 A H H -0.300111 L H H -0.300112 G . H -0.600113 Q h H -0.600114 H h H -0.150115 Q h . -0.150116 I h . 0.450117 K h T -0.050118 W h T -0.200119 E h T 0.250120 W h T 0.700121 V h . -0.600122 K h . 0.300

Page 29: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Randal C. CevallosBioc 218

Final Project123 G . . -0.600124 H . . -0.300125 A t . 0.500126 G t . 0.650127 H . . 0.600128 P . . 0.900129 E H . 0.900130 N H H 0.900131 E H H 0.900132 R H H 0.900133 A H H 0.900134 D H H 0.900135 E H H 0.900136 L H H 0.750137 A H H 0.600138 R H H 0.300139 A H H -0.300

140 A H H -0.300141 A H H 0.300142 M H H -0.600143 N . . -0.600144 P T . 0.250145 T T . 1.000146 L . . 0.900147 E . T 1.000148 D . T 1.000149 T T T 1.400150 G T . 1.000151 Y B . 0.450152 Q B B -0.150153 V B B -0.150154 E B B -0.450155 V B B 0.45

Page 30: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Appendix D: PROF King results

Trypsin IVprediction

1 I C 0.5472 V E 0.6653 G E 0.5794 G C 0.5445 Y C 0.6286 T C 0.6587 C C 0.7658 E C 0.8479 E C 0.84110 N C 0.84611 S C 0.85012 L C 0.79413 P C 0.55414 Y E 0.80715 Q E 0.94616 V E 0.96217 S E 0.96018 L E 0.92919 N E 0.73420 S C 0.62521 G C 0.83722 S C 0.76223 H E 0.49324 F E 0.80925 C E 0.81626 G E 0.73327 G E 0.79928 S E 0.87429 L E 0.86030 I E 0.76831 S C 0.61832 E C 0.73533 Q C 0.58134 W E 0.78935 V E 0.91036 V E 0.94037 S E 0.91638 A E 0.81139 A E 0.69340 H E 0.63341 C E 0.48442 Y C 0.68643 K C 0.79444 T C 0.80245 R C 0.69346 I E 0.51247 Q E 0.84748 V E 0.93949 R E 0.949

50 L E 0.93551 G E 0.86852 E E 0.74653 H E 0.74754 N E 0.75155 I E 0.73356 K E 0.73357 V E 0.67458 L C 0.55959 E C 0.81260 G C 0.87561 N C 0.85362 E C 0.71763 Q E 0.59264 F E 0.86565 I E 0.90066 N E 0.90667 A E 0.92168 V E 0.90669 K E 0.90670 I E 0.92171 I E 0.90072 R E 0.76573 H C 0.54974 P C 0.79075 K C 0.88576 Y C 0.88577 N C 0.90078 R C 0.88379 D C 0.86880 T C 0.86781 L C 0.86282 D C 0.84783 N C 0.78084 D C 0.59185 I E 0.83086 M E 0.95187 L E 0.96188 I E 0.95989 K E 0.92090 L E 0.62691 S C 0.69292 S C 0.85693 P C 0.83394 A C 0.75395 V C 0.63796 I C 0.68397 N C 0.72698 A C 0.61899 R E 0.550100 V E 0.752101 S E 0.803102 T E 0.777103 I E 0.671

104 S C 0.524105 L C 0.796106 P C 0.884107 T C 0.889108 A C 0.881109 P C 0.889110 P C 0.888111 A C 0.889112 A C 0.876113 G C 0.826114 T C 0.629115 E E 0.764116 C E 0.927117 L E 0.959118 I E 0.957119 S E 0.902120 G E 0.744121 W E 0.611122 G E 0.575123 N E 0.586124 T E 0.491125 L C 0.685126 S C 0.857127 F C 0.882128 G C 0.881129 A C 0.878130 D C 0.887131 Y C 0.869132 P C 0.733133 D C 0.693134 E C 0.657135 L C 0.459136 K E 0.747137 C E 0.888138 L E 0.912139 D E 0.888140 A E 0.858141 P E 0.858142 V E 0.822143 L E 0.578144 T C 0.722145 Q C 0.633146 A C 0.492147 E C 0.424148 C C 0.432149 K C 0.413150 A H 0.455151 S H 0.403152 Y C 0.456153 P C 0.785154 G C 0.831155 K C 0.788156 I C 0.764157 T C 0.713

Page 31: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

158 N C 0.508159 S E 0.723160 M E 0.907161 F E 0.951162 C E 0.927163 V E 0.786164 G E 0.609165 F C 0.492166 L C 0.719167 E C 0.850168 G C 0.866169 G C 0.820170 K C 0.643171 D E 0.617172 S E 0.758173 C E 0.802174 Q E 0.780175 R E 0.558176 D C 0.726177 S C 0.846178 G C 0.816179 G C 0.649180 P E 0.705

181 V E 0.844182 V E 0.887183 C E 0.868184 N E 0.714185 G E 0.751186 Q E 0.885187 L E 0.922188 Q E 0.935189 G E 0.946190 V E 0.940191 V E 0.922192 S E 0.803193 W E 0.521194 G C 0.630195 H C 0.709196 G C 0.702197 C C 0.761198 A C 0.827199 W C 0.855200 K C 0.851201 N C 0.869202 R C 0.861203 P C 0.757

204 G E 0.597205 V E 0.913206 Y E 0.967207 T E 0.965208 K E 0.927209 V E 0.716210 Y C 0.447211 N H 0.480212 Y H 0.679213 V H 0.808214 D H 0.879215 W H 0.892216 I H 0.848217 K H 0.810218 D H 0.796219 T H 0.797220 I H 0.718221 A H 0.473222 A C 0.587223 N C 0.898224 S C 0.958

RNase SA Prediction1 D C 0.9672 V C 0.9403 S C 0.8964 G C 0.6715 T E 0.7196 V E 0.8997 C E 0.9268 L E 0.8539 S E 0.57010 A C 0.65111 L C 0.84212 P C 0.89513 P C 0.78914 E C 0.69415 A C 0.69616 T C 0.71617 D C 0.65818 T E 0.54319 L E 0.75620 N E 0.83421 L E 0.85122 I E 0.81423 A E 0.63624 S C 0.54625 D C 0.78826 G C 0.86927 P C 0.88928 F C 0.87829 P C 0.88330 Y C 0.89031 S C 0.883

32 Q C 0.87233 D C 0.85734 G C 0.67435 V E 0.73536 V E 0.90837 F E 0.92138 Q E 0.79539 N C 0.51940 R C 0.76341 E C 0.80242 S C 0.79043 V C 0.77544 L C 0.77445 P C 0.80846 T C 0.83347 Q C 0.82748 S C 0.82649 Y C 0.80650 G C 0.74651 Y C 0.58852 Y E 0.69353 H E 0.83654 E E 0.89455 Y E 0.93856 T E 0.93857 V E 0.92958 I E 0.88459 T E 0.73860 P E 0.58361 G E 0.47862 A C 0.571

63 R C 0.61964 T C 0.58365 R C 0.54066 G E 0.49367 T E 0.66868 R E 0.81169 R E 0.85570 I E 0.85371 I E 0.85472 T E 0.74473 G E 0.60074 E C 0.53875 A C 0.68876 T C 0.74877 Q C 0.65478 E C 0.48479 D E 0.64180 Y E 0.75681 Y E 0.72182 T E 0.48883 G C 0.76084 D C 0.76685 H C 0.54786 Y E 0.77387 A E 0.83688 T E 0.81789 F E 0.86290 S E 0.86891 L E 0.84192 I E 0.74593 D E 0.573

Page 32: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

94 Q C 0.472 95 T C 0.733 96 C C 0.929

RNase Hi Prediction1 M C 0.9732 L C 0.9033 K C 0.5984 Q E 0.8415 V E 0.9136 E E 0.9367 I E 0.9308 F E 0.8599 T E 0.65810 A C 0.60111 G C 0.73912 S C 0.77913 A C 0.82414 L C 0.88915 G C 0.91616 N C 0.91917 P C 0.92118 G C 0.90519 P C 0.87020 G C 0.84821 G C 0.75622 Y E 0.48823 G E 0.81624 A E 0.92725 I E 0.94926 L E 0.93027 R E 0.84828 Y E 0.61729 R C 0.67630 G C 0.86231 R C 0.86832 E C 0.70033 K C 0.47234 T E 0.64335 F E 0.62836 S C 0.46837 A C 0.63238 G C 0.67839 Y C 0.70840 T C 0.79641 R C 0.80542 T C 0.79543 T C 0.71644 N C 0.73045 N C 0.67546 R H 0.48447 M H 0.63348 E H 0.75649 L H 0.79650 M H 0.79151 A H 0.83452 A H 0.838

53 I H 0.84254 V H 0.84455 A H 0.82456 L H 0.80157 E H 0.78358 A H 0.72359 L H 0.62060 K H 0.53461 E H 0.44162 H C 0.41463 A C 0.41264 E E 0.48365 V E 0.62266 I E 0.68067 L E 0.68768 S E 0.56769 T C 0.67170 D C 0.82171 S H 0.59072 Q H 0.85673 Y H 0.88974 V H 0.94175 R H 0.94276 Q H 0.92077 G H 0.89778 I H 0.87979 T H 0.82480 Q H 0.70281 W H 0.59882 I H 0.61683 H H 0.66984 N H 0.57285 W H 0.49286 K C 0.50987 K C 0.62888 R C 0.77989 G C 0.81290 W C 0.68991 K C 0.61992 T C 0.67493 A C 0.78594 D C 0.88195 K C 0.91696 K C 0.91397 P C 0.88798 V C 0.81699 K C 0.753100 N C 0.779101 V H 0.706102 D H 0.873103 L H 0.923104 W H 0.946

105 Q H 0.959106 R H 0.969107 L H 0.966108 D H 0.956109 A H 0.942110 A H 0.917111 L H 0.819112 G H 0.634113 Q C 0.688114 H C 0.884115 Q C 0.776116 I E 0.563117 K E 0.880118 W E 0.947119 E E 0.939120 W E 0.914121 V E 0.744122 K C 0.585123 G C 0.807124 H C 0.912125 A C 0.923126 G C 0.921127 H C 0.931128 P C 0.915129 E C 0.857130 N H 0.540131 E H 0.800132 R H 0.880133 A H 0.942134 D H 0.954135 E H 0.960136 L H 0.958137 A H 0.945138 R H 0.928139 A H 0.870140 A H 0.704141 A C 0.518142 M C 0.628143 N C 0.819144 P C 0.807145 T C 0.804146 L C 0.622147 E C 0.629148 D C 0.756149 T C 0.851150 G C 0.877151 Y C 0.826152 Q C 0.753153 V C 0.649154 E C 0.658155 V C 0.882

Page 33: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

Hep C RNA pol Prediction1 S C 0.9662 X C 0.7443 S E 0.6214 Y E 0.8005 T E 0.7876 W E 0.6787 T C 0.4718 G C 0.5489 A C 0.59510 L C 0.63311 I C 0.69512 T C 0.77413 P C 0.71714 C C 0.71815 A C 0.61216 A C 0.61117 E C 0.69718 E C 0.79919 S C 0.89020 K C 0.91221 L C 0.90722 P C 0.86823 I C 0.74524 N C 0.74125 A C 0.76426 L C 0.62727 S C 0.57028 N H 0.50429 S H 0.68530 L H 0.74231 L H 0.68032 R H 0.56733 H C 0.61334 H C 0.83135 N C 0.85936 X E 0.51137 V E 0.88338 Y E 0.93939 A E 0.92240 T E 0.75341 T C 0.58242 S C 0.72543 R C 0.60644 S H 0.58045 A H 0.63646 G H 0.53547 L H 0.41548 R C 0.41149 Q C 0.41250 K C 0.43551 K E 0.53452 V E 0.59953 T E 0.73954 F E 0.69755 D E 0.680

56 R E 0.62557 L C 0.48858 Q C 0.61659 V C 0.73460 L C 0.81361 D C 0.78862 D H 0.74963 H H 0.80964 Y H 0.91065 R H 0.94266 D H 0.94967 V H 0.95368 L H 0.94869 K H 0.95270 E H 0.95271 X H 0.95172 K H 0.93973 A H 0.92674 K H 0.92375 A H 0.91776 S H 0.92977 T H 0.91678 V H 0.87179 K H 0.79480 A H 0.63481 K C 0.52582 L C 0.55583 L C 0.74884 S C 0.72585 V H 0.76186 E H 0.70387 E H 0.78088 A H 0.81389 C H 0.65690 K C 0.52091 L C 0.82492 T C 0.91093 P C 0.92394 P C 0.83195 H C 0.65896 S C 0.58797 A C 0.51698 K C 0.50099 S C 0.477100 K H 0.473101 F C 0.512102 G C 0.692103 Y C 0.739104 G C 0.786105 A C 0.576106 K C 0.396107 D C 0.422108 V H 0.386109 R C 0.412110 N C 0.458

111 L C 0.552112 S C 0.721113 S C 0.640114 K H 0.615115 A H 0.805116 V H 0.884117 N H 0.915118 H H 0.941119 I H 0.950120 H H 0.953121 S H 0.947122 V H 0.943123 W H 0.947124 K H 0.944125 D H 0.928126 L H 0.900127 L H 0.779128 E H 0.532129 D C 0.762130 T C 0.911131 V C 0.938132 T C 0.926133 P C 0.885134 I C 0.748135 D C 0.521136 T E 0.733137 T E 0.875138 I E 0.876139 X E 0.730140 A E 0.627141 K E 0.508142 N E 0.515143 E E 0.648144 V E 0.787145 F E 0.795146 C E 0.738147 V E 0.518148 Q C 0.688149 P C 0.836150 E C 0.877151 K C 0.885152 G C 0.907153 G C 0.921154 R C 0.917155 K C 0.907156 P C 0.862157 A C 0.749158 R E 0.689159 L E 0.876160 I E 0.889161 V E 0.795162 F C 0.568163 P C 0.823164 D C 0.875165 L C 0.863

Page 34: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

166 G C 0.875167 V C 0.592168 R C 0.344169 V H 0.511170 C H 0.465171 E H 0.598172 K H 0.758173 X H 0.846174 A H 0.834175 L H 0.728176 Y H 0.603177 D C 0.502178 V H 0.561179 V H 0.551180 S C 0.456181 T C 0.559182 L C 0.609183 P C 0.434184 Q E 0.536185 V E 0.756186 V E 0.807187 X E 0.644188 G C 0.524189 S C 0.534190 S C 0.460191 Y E 0.582192 G E 0.632193 F E 0.662194 Q E 0.692195 Y C 0.583196 S C 0.815197 P H 0.557198 G H 0.761199 Q H 0.845200 R H 0.924201 V H 0.952202 E H 0.959203 F H 0.956204 L H 0.957205 V H 0.961206 N H 0.957207 T H 0.938208 W H 0.880209 K H 0.731210 S H 0.528211 K C 0.783212 K C 0.901213 N C 0.929214 P C 0.882215 X C 0.722216 G C 0.540217 F E 0.610218 S E 0.630219 Y E 0.524220 D C 0.596221 T C 0.613222 R C 0.554

223 C C 0.444224 F C 0.604225 D C 0.680226 S C 0.555227 T C 0.558228 V C 0.519229 T C 0.597230 E H 0.435231 N H 0.492232 D H 0.511233 I H 0.645234 R H 0.746235 V H 0.891236 E H 0.925237 E H 0.933238 S H 0.942239 I H 0.933240 Y H 0.885241 Q H 0.701242 C H 0.562243 C C 0.640244 D C 0.857245 L C 0.900246 A C 0.920247 P H 0.665248 E H 0.846249 A H 0.924250 R H 0.954251 Q H 0.957252 A H 0.960253 I H 0.953254 K H 0.954255 S H 0.945256 L H 0.932257 T H 0.897258 E H 0.846259 R H 0.815260 L H 0.727261 Y H 0.521262 I C 0.442263 G C 0.793264 G C 0.840265 P C 0.759266 L C 0.698267 T C 0.767268 N C 0.889269 S C 0.915270 K C 0.913271 G C 0.882272 Q C 0.757273 N C 0.516274 C E 0.572275 G E 0.623276 Y E 0.637277 R E 0.659278 R E 0.714279 C E 0.725

280 R E 0.662281 A C 0.509282 S C 0.665283 G C 0.521284 V E 0.647285 L E 0.700286 T E 0.677287 T E 0.559288 S C 0.576289 C C 0.742290 G C 0.693291 N H 0.487292 T H 0.785293 L H 0.874294 T H 0.906295 C H 0.926296 Y H 0.938297 L H 0.937298 K H 0.953299 A H 0.962300 S H 0.959301 A H 0.957302 A H 0.943303 C H 0.919304 R H 0.863305 A H 0.773306 A H 0.509307 K C 0.754308 L C 0.849309 Q C 0.848310 D C 0.779311 C C 0.640312 T E 0.511313 X E 0.674314 L E 0.761315 V E 0.656316 N C 0.580317 G C 0.857318 D C 0.894319 D C 0.616320 L E 0.840321 V E 0.928322 V E 0.936323 I E 0.929324 C E 0.873325 E E 0.578326 S C 0.754327 A C 0.868328 G C 0.902329 T C 0.876330 Q C 0.581331 E H 0.824332 D H 0.879333 A H 0.939334 A H 0.951335 S H 0.959336 L H 0.966

Page 35: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

337 R H 0.965338 V H 0.961339 F H 0.954340 T H 0.949341 E H 0.938342 A H 0.892343 X H 0.757344 T H 0.629345 R H 0.535346 Y C 0.615347 S C 0.842348 A C 0.907349 P C 0.905350 P C 0.921351 G C 0.925352 D C 0.930353 P C 0.930354 P C 0.924355 Q C 0.901356 P C 0.770357 E C 0.574358 Y C 0.473359 D C 0.514360 L C 0.483361 E E 0.478362 L E 0.489363 I E 0.482364 T E 0.572365 S E 0.557366 C C 0.466367 S C 0.629368 S C 0.725369 N C 0.695370 V E 0.483371 S E 0.611372 V E 0.735373 A E 0.806374 H E 0.690375 D C 0.550376 A C 0.802377 S C 0.888378 G C 0.864379 K C 0.593380 R E 0.853381 V E 0.931382 Y E 0.944383 Y E 0.926384 L E 0.878385 T E 0.626386 R C 0.673387 D C 0.837388 P C 0.892389 T C 0.899390 T C 0.733391 P H 0.568392 L H 0.606393 A H 0.643

394 R H 0.758395 A H 0.729396 A H 0.718397 W H 0.775398 E H 0.700399 T H 0.602400 A C 0.538401 R C 0.768402 H C 0.837403 T C 0.886404 P C 0.820405 V H 0.498406 N H 0.693407 S H 0.734408 W H 0.844409 L H 0.851410 G H 0.799411 N H 0.715412 I H 0.772413 I H 0.713414 X H 0.557415 Y C 0.674416 A C 0.856417 P C 0.772418 T C 0.668419 L H 0.590420 W H 0.473421 A H 0.607422 R H 0.622423 X H 0.493424 I H 0.479425 L C 0.388426 X C 0.598427 T H 0.703428 H H 0.836429 F H 0.867430 F H 0.925431 S H 0.928432 I H 0.913433 L H 0.834434 L H 0.680435 A H 0.496436 Q C 0.557437 E C 0.672438 Q C 0.720439 L C 0.737440 E C 0.676441 K C 0.556442 A C 0.594443 L C 0.520444 D C 0.454445 C E 0.620446 Q E 0.682447 I E 0.651448 Y E 0.468449 G C 0.582450 A C 0.556

451 C E 0.569452 Y E 0.571453 S C 0.623454 I C 0.786455 E C 0.856456 P C 0.709457 L C 0.662458 D C 0.754459 L C 0.744460 P H 0.848461 Q H 0.908462 I H 0.924463 I H 0.934464 E H 0.922465 R H 0.813466 L H 0.560467 H C 0.830468 G C 0.934469 L C 0.912470 S C 0.549471 A E 0.783472 F E 0.892473 S E 0.896474 L E 0.804475 H E 0.497476 S C 0.726477 Y C 0.863478 S C 0.894479 P C 0.818480 G H 0.528481 E H 0.626482 I H 0.837483 N H 0.919484 R H 0.942485 V H 0.954486 A H 0.957487 S H 0.953488 C H 0.941489 L H 0.929490 R H 0.852491 K H 0.665492 L C 0.713493 G C 0.913494 V C 0.933495 P C 0.919496 P C 0.707497 L H 0.676498 R H 0.794499 V H 0.892500 W H 0.937501 R H 0.946502 H H 0.948503 R H 0.936504 A H 0.934505 R H 0.954506 S H 0.962507 V H 0.960

Page 36: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

508 R H 0.957509 A H 0.957510 R H 0.957511 L H 0.937512 L H 0.905513 S H 0.770514 Q C 0.585515 G C 0.894

516 G C 0.920517 R H 0.547518 A H 0.699519 A H 0.726520 T H 0.811521 C H 0.778522 G H 0.677523 K H 0.617

524 Y H 0.669525 L H 0.661526 F H 0.531527 N H 0.509528 W H 0.523529 A H 0.415530 V C 0.615531 K C 0.943

532

Procathepsin L Prediction1 S C 0.9672 L C 0.9423 T C 0.8934 F C 0.7675 D H 0.6186 H H 0.8017 S H 0.8688 L H 0.9269 E H 0.94610 A H 0.95711 Q H 0.95712 W H 0.95913 T H 0.96214 K H 0.95615 W H 0.94016 K H 0.93217 A H 0.89918 M H 0.77019 H C 0.52920 N C 0.83621 R C 0.86922 L C 0.80223 Y C 0.77024 G C 0.83025 M C 0.70626 N H 0.51027 E H 0.68328 E H 0.82029 G H 0.89130 W H 0.90631 R H 0.91432 R H 0.91133 A H 0.91634 V H 0.91635 W H 0.91236 E H 0.91437 K H 0.88938 N H 0.90039 M H 0.91340 K H 0.93441 M H 0.94242 I H 0.93043 E H 0.91444 L H 0.86045 H H 0.704

46 N C 0.51047 Q C 0.71048 E C 0.76949 Y C 0.80750 R C 0.84151 E C 0.80252 G C 0.81653 K C 0.78354 H C 0.67555 S E 0.52256 F E 0.75757 T E 0.83858 M E 0.77659 A E 0.62260 M E 0.42861 N C 0.48762 A C 0.60363 F C 0.56364 G C 0.68265 D C 0.75966 M C 0.71567 T C 0.82668 S H 0.83069 E H 0.89470 E H 0.92171 F H 0.94472 R H 0.93273 Q H 0.90474 V H 0.80375 M H 0.64376 N C 0.63677 G C 0.82378 F C 0.85879 Q C 0.81980 N C 0.78681 R C 0.83082 K C 0.87883 P C 0.87884 R C 0.86185 K C 0.84086 G C 0.82587 K C 0.73488 V C 0.65189 F C 0.62090 Q C 0.635

91 E C 0.72392 P C 0.81593 L C 0.82294 F C 0.82095 Y C 0.87596 E C 0.89997 A C 0.90898 P C 0.85699 R C 0.737100 S C 0.622101 V C 0.613102 D C 0.711103 W C 0.607104 R C 0.557105 E C 0.570106 K C 0.694107 G C 0.671108 Y C 0.561109 V C 0.608110 T C 0.612111 P C 0.644112 V C 0.584113 K C 0.575114 N C 0.773115 Q C 0.884116 G C 0.874117 Q C 0.829118 C C 0.818119 G C 0.792120 S C 0.697121 X C 0.419122 W H 0.546123 A H 0.573124 F H 0.767125 S H 0.754126 A H 0.799127 T H 0.861128 G H 0.853129 A H 0.868130 L H 0.882131 E H 0.864132 G H 0.858133 Q H 0.870134 M H 0.848135 F H 0.732

Page 37: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

136 R H 0.614137 K H 0.472138 T C 0.703139 G C 0.873140 R C 0.778141 L C 0.484142 I E 0.613143 S E 0.637144 L E 0.473145 S C 0.598146 E C 0.483147 Q C 0.438148 N C 0.415149 L E 0.493150 V E 0.574151 D E 0.564152 C C 0.568153 S C 0.739154 G C 0.826155 P C 0.862156 Q C 0.877157 G C 0.895158 N C 0.907159 E C 0.903160 G C 0.910161 C C 0.888162 N C 0.851163 G C 0.851164 G C 0.868165 L C 0.835166 M H 0.521167 D H 0.624168 Y H 0.833169 A H 0.910170 F H 0.932171 Q H 0.949172 Y H 0.943173 V H 0.928174 Q H 0.876175 D H 0.714176 N C 0.581177 G C 0.848178 G C 0.906179 L C 0.849180 D C 0.770181 S C 0.791182 E C 0.847183 E C 0.867184 S C 0.874185 Y C 0.882186 P C 0.877187 Y C 0.880188 E C 0.875189 A C 0.867190 T C 0.880191 E C 0.886192 E C 0.893

193 S C 0.855194 C C 0.818195 K C 0.792196 Y C 0.782197 N C 0.821198 P C 0.865199 K C 0.867200 Y C 0.802201 S C 0.544202 V E 0.773203 A E 0.865204 N E 0.865205 D E 0.771206 A E 0.640207 G E 0.685208 F E 0.786209 V E 0.834210 D E 0.722211 I C 0.559212 P C 0.777213 K C 0.856214 Q C 0.816215 E H 0.866216 K H 0.910217 A H 0.939218 L H 0.956219 M H 0.962220 K H 0.964221 A H 0.945222 V H 0.925223 A H 0.823224 T H 0.590225 V C 0.873226 G C 0.928227 P C 0.830228 I E 0.758229 S E 0.923230 V E 0.961231 A E 0.955232 I E 0.927233 D E 0.739234 A C 0.501235 G C 0.718236 H C 0.641237 E H 0.580238 S H 0.620239 F H 0.596240 L H 0.589241 F H 0.455242 Y H 0.360243 K C 0.443244 E C 0.625245 G C 0.612246 I E 0.560247 Y E 0.667248 F E 0.580249 E C 0.569

250 P C 0.716251 D C 0.774252 C C 0.805253 S C 0.839254 S C 0.850255 E C 0.830256 D C 0.837257 M C 0.834258 D C 0.799259 H E 0.563260 G E 0.859261 V E 0.931262 L E 0.953263 V E 0.948264 V E 0.916265 G E 0.838266 Y E 0.658267 G C 0.511268 F C 0.591269 E C 0.721270 S C 0.811271 T C 0.865272 E C 0.888273 S C 0.889274 D C 0.888275 N C 0.888276 N C 0.797277 K E 0.513278 Y E 0.865279 W E 0.940280 L E 0.942281 V E 0.922282 K E 0.801283 N C 0.503284 S C 0.717285 W C 0.795286 G C 0.821287 E C 0.814288 E C 0.824289 W C 0.824290 G C 0.823291 M C 0.738292 G C 0.678293 G C 0.571294 Y E 0.762295 V E 0.897296 K E 0.923297 M E 0.913298 A E 0.773299 K C 0.504300 D C 0.767301 R C 0.875302 R C 0.882303 N C 0.852304 H C 0.732305 C C 0.597306 G C 0.519

Page 38: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

307 I E 0.482308 A E 0.580309 S E 0.579310 A C 0.449311 A C 0.617312 S C 0.642313 Y C 0.660

314 P C 0.653315 T C 0.727316 V C 0.891

Polio 3C proteinase Prediction1 G C 0.9642 P C 0.9153 G C 0.7404 F H 0.7065 D H 0.7366 Y H 0.8257 A H 0.8828 V H 0.8939 A H 0.86810 M H 0.84211 A H 0.71312 K C 0.54813 R C 0.84214 N C 0.87615 I C 0.54416 V E 0.86017 T E 0.95218 A E 0.96219 T E 0.94220 T E 0.81321 S E 0.51022 K C 0.80823 G C 0.79324 E E 0.57725 F E 0.87626 T E 0.93427 M E 0.95728 L E 0.93729 G E 0.85230 V E 0.79531 H E 0.58632 D C 0.62233 N E 0.62834 V E 0.84235 A E 0.89136 I E 0.87637 L E 0.66838 P C 0.57039 T C 0.73640 H C 0.79141 A C 0.84042 S C 0.87943 P C 0.89644 G C 0.87345 E C 0.73646 S E 0.71747 I E 0.907

48 V E 0.93849 I E 0.90650 D E 0.66851 G C 0.58552 K C 0.62253 E C 0.47954 V E 0.54955 E E 0.64956 I E 0.63957 L E 0.48758 D C 0.41959 A E 0.45160 K E 0.53161 A E 0.59762 L E 0.62463 E E 0.57464 D E 0.50365 Q C 0.44566 A C 0.77767 G C 0.86968 T C 0.84069 N C 0.74670 L E 0.55771 E E 0.84672 I E 0.94373 T E 0.95874 I E 0.95875 I E 0.95176 T E 0.92777 L E 0.76078 K C 0.57279 R C 0.76980 N C 0.81581 E C 0.78182 K C 0.71183 F C 0.62484 R C 0.47785 D C 0.46686 I C 0.47187 R C 0.53488 P C 0.61889 H C 0.69390 I C 0.63791 P C 0.54892 T C 0.51993 Q C 0.58994 I C 0.684

95 T C 0.78396 E C 0.83797 T C 0.86498 N C 0.87899 D C 0.808100 G E 0.498101 V E 0.874102 L E 0.947103 I E 0.959104 V E 0.945105 N E 0.863106 T E 0.653107 S C 0.588108 K C 0.855109 Y C 0.891110 P C 0.840111 N C 0.597112 M E 0.697113 Y E 0.830114 V E 0.819115 P E 0.701116 V E 0.501117 G C 0.554118 A C 0.565119 V C 0.473120 T E 0.466121 E E 0.473122 Q E 0.505123 G E 0.559124 Y E 0.632125 L E 0.675126 N E 0.620127 L C 0.595128 G C 0.768129 G C 0.830130 R C 0.824131 Q C 0.797132 T C 0.679133 A C 0.563134 R E 0.561135 T E 0.770136 L E 0.814137 M E 0.752138 Y E 0.609139 N C 0.473140 F C 0.658141 P C 0.663

Page 39: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

142 T C 0.637143 R C 0.617144 A C 0.692145 G C 0.734146 Q C 0.727147 C C 0.704148 G C 0.663149 G E 0.701150 V E 0.910151 I E 0.943152 T E 0.922153 C E 0.816154 T C 0.597155 G C 0.796

156 K E 0.536157 V E 0.887158 I E 0.938159 G E 0.964160 M E 0.960161 H E 0.929162 V E 0.720163 G C 0.675164 G C 0.832165 N C 0.871166 G C 0.876167 S C 0.773168 H H 0.548169 G H 0.734

170 F H 0.865171 A H 0.916172 A H 0.930173 A H 0.922174 L H 0.898175 K H 0.881176 R H 0.838177 S H 0.752178 Y H 0.627179 F H 0.500180 T H 0.476181 Q C 0.522182 S C 0.765183 Q C 0.950

184

Whale Myoglobin1 V C 0.9612 L C 0.9323 S C 0.8574 E H 0.7945 G H 0.8516 E H 0.9197 W H 0.9548 Q H 0.9579 L H 0.96310 V H 0.96511 L H 0.96912 H H 0.96813 V H 0.96714 W H 0.95615 A H 0.92916 K H 0.85817 V H 0.67218 E C 0.54319 A C 0.80120 D C 0.87821 V C 0.65522 A H 0.56923 G H 0.57824 H C 0.60425 G H 0.82626 Q H 0.95327 D H 0.96628 I H 0.97029 L H 0.97430 I H 0.97831 R H 0.97632 L H 0.96633 F H 0.93334 K H 0.75035 S C 0.52336 H C 0.83637 P C 0.81138 E C 0.56739 T H 0.763

40 L H 0.83341 E H 0.79942 K H 0.86243 F H 0.82844 D H 0.81445 R H 0.65046 F C 0.53347 K C 0.71448 H C 0.83849 L C 0.88350 K C 0.87751 T C 0.77152 E H 0.61853 A H 0.61954 E H 0.53255 M C 0.49656 K C 0.63657 A C 0.85058 S C 0.88059 E H 0.66060 D H 0.64461 L H 0.64262 K H 0.82763 K H 0.78064 H H 0.63965 G H 0.74466 V H 0.87467 T H 0.91968 V H 0.93869 L H 0.95870 T H 0.95971 A H 0.96072 L H 0.96273 G H 0.96174 A H 0.96475 I H 0.96176 L H 0.93377 K H 0.87878 K H 0.727

79 K H 0.63680 G H 0.49881 H C 0.53982 H H 0.87683 E H 0.93084 A H 0.93785 E H 0.96586 L H 0.96787 K H 0.96588 P H 0.96189 L H 0.95490 A H 0.93991 Q H 0.93692 S H 0.89793 H H 0.75894 A H 0.58195 T C 0.65396 K C 0.69597 H C 0.82398 K C 0.87099 I C 0.849100 P C 0.855101 I H 0.730102 K H 0.728103 Y H 0.837104 L H 0.896105 E H 0.909106 F H 0.912107 I H 0.871108 S H 0.864109 E H 0.830110 A H 0.905111 I H 0.933112 I H 0.956113 H H 0.965114 V H 0.960115 L H 0.954116 H H 0.929117 S H 0.797

Page 40: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

118 R H 0.528119 H C 0.840120 P C 0.913121 G C 0.888122 D C 0.835123 F C 0.888124 G C 0.908125 A H 0.881126 D H 0.932127 A H 0.957128 Q H 0.970129 G H 0.967130 A H 0.963

131 M H 0.966132 N H 0.966133 K H 0.965134 A H 0.959135 L H 0.959136 E H 0.958137 L H 0.963138 F H 0.961139 R H 0.964140 K H 0.960141 D H 0.958142 I H 0.955143 A H 0.952

144 A H 0.951145 K H 0.913146 Y H 0.877147 K H 0.901148 E H 0.821149 L H 0.571150 G C 0.659151 Y C 0.834152 Q C 0.914153 G C 0.968154 END

155

GFP Prediction1 M C 0.9642 S C 0.9303 K C 0.8984 G C 0.8085 E C 0.5586 E C 0.4157 L E 0.4328 F E 0.5399 T E 0.55410 G E 0.58211 V E 0.76912 V E 0.87513 P E 0.90314 I E 0.94815 L E 0.95416 V E 0.94917 E E 0.92118 L E 0.83619 D E 0.64320 G E 0.51421 D C 0.49622 V C 0.54023 N C 0.69824 G C 0.65925 H E 0.61226 K E 0.86327 F E 0.94328 S E 0.95929 V E 0.94730 S E 0.86931 G E 0.69632 E E 0.56633 G C 0.60934 E C 0.78035 G C 0.87036 D C 0.85837 A C 0.81238 T C 0.73239 Y C 0.61540 G C 0.618

41 K E 0.54242 L E 0.77343 T E 0.92044 L E 0.95645 K E 0.95946 F E 0.95647 I E 0.95148 C E 0.90449 T E 0.62450 T C 0.70551 G C 0.86852 K C 0.88953 L C 0.85454 P C 0.84855 V C 0.86256 P C 0.83357 W C 0.68758 P C 0.52259 T C 0.40560 L E 0.49361 V E 0.50562 T E 0.48663 T E 0.51164 F E 0.59265 X E 0.59366 V E 0.59567 Q E 0.65868 C E 0.66469 F E 0.64370 S E 0.55171 R C 0.50672 Y C 0.76273 P C 0.82774 D C 0.74075 H C 0.70576 M C 0.71177 K C 0.73878 R C 0.61679 H C 0.57780 D C 0.499

81 F H 0.49082 F H 0.50183 K H 0.46384 S C 0.41985 A C 0.63986 M C 0.84987 P C 0.87488 E C 0.85489 G C 0.83290 Y C 0.72391 V E 0.52292 Q E 0.74293 E E 0.78694 R E 0.83395 T E 0.92696 I E 0.93697 F E 0.92898 F E 0.86599 K E 0.645100 D C 0.837101 D C 0.887102 G C 0.716103 N E 0.698104 Y E 0.870105 K E 0.922106 T E 0.918107 R E 0.893108 A E 0.825109 E E 0.886110 V E 0.927111 K E 0.917112 F E 0.852113 E E 0.637114 G C 0.783115 D C 0.728116 T E 0.795117 L E 0.920118 V E 0.956119 N E 0.971120 R E 0.962

Page 41: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

121 I E 0.942122 E E 0.923123 L E 0.832124 K E 0.592125 G C 0.577126 I C 0.705127 D C 0.821128 F C 0.878129 K C 0.878130 E C 0.855131 D C 0.854132 G C 0.840133 N C 0.738134 I C 0.533135 L E 0.507136 G E 0.520137 H E 0.559138 K E 0.627139 L E 0.606140 E E 0.666141 Y E 0.688142 N E 0.505143 Y C 0.653144 N C 0.765145 S C 0.687146 H E 0.521147 N E 0.780148 V E 0.920149 Y E 0.947150 I E 0.911151 M E 0.709152 A C 0.494153 D C 0.734154 K C 0.770155 Q C 0.807156 K C 0.809157 N C 0.811158 G C 0.687159 I E 0.541

160 K E 0.842161 V E 0.913162 N E 0.931163 F E 0.929164 K E 0.928165 I E 0.920166 R E 0.899167 H E 0.786168 N E 0.541169 I C 0.675170 E C 0.750171 D C 0.796172 G C 0.818173 S C 0.700174 V E 0.610175 Q E 0.778176 L E 0.795177 A E 0.740178 D E 0.628179 H E 0.494180 Y E 0.476181 Q E 0.446182 Q C 0.516183 N C 0.750184 T C 0.845185 P C 0.829186 I C 0.826187 G C 0.854188 D C 0.876189 G C 0.877190 P C 0.791191 V C 0.598192 L C 0.529193 L C 0.630194 P C 0.751195 D C 0.801196 N C 0.724197 H C 0.538198 Y E 0.628

199 L E 0.676200 S E 0.669201 T E 0.665202 Q E 0.688203 S E 0.693204 A E 0.625205 L E 0.468206 S C 0.602207 K C 0.762208 D C 0.870209 P C 0.881210 N C 0.857211 E C 0.825212 K C 0.785213 R C 0.709214 D C 0.496215 H E 0.636216 M E 0.776217 V E 0.773218 L E 0.550219 L H 0.494220 E H 0.612221 F H 0.666222 V H 0.747223 T H 0.795224 A H 0.697225 A C 0.532226 G C 0.789227 I C 0.794228 T C 0.768229 H C 0.847230 G C 0.878231 M C 0.771232 D C 0.508233 E C 0.477234 L C 0.494235 Y C 0.562236 K C 0.930

Yeast eIF6 prediction1 M C 0.9142 A C 0.4033 T E 0.5104 R E 0.5885 T E 0.6176 Q E 0.6297 F E 0.5598 E C 0.6289 N C 0.83410 S C 0.83711 N C 0.70412 E E 0.69513 I E 0.89114 G E 0.943

15 V E 0.95916 F E 0.95717 S E 0.92018 K E 0.83219 L E 0.60320 T C 0.77621 N C 0.76622 T E 0.67223 Y E 0.89524 C E 0.94625 L E 0.96026 V E 0.94527 A E 0.77528 V C 0.592

29 G C 0.81830 G C 0.88231 S C 0.88732 E C 0.72133 N C 0.62634 F H 0.64935 Y H 0.77936 S H 0.78237 A H 0.82738 F H 0.79639 E H 0.81640 A H 0.81641 E H 0.74442 L C 0.461

Page 42: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

43 G C 0.85544 D C 0.85945 A C 0.80746 I C 0.64547 P E 0.64548 I E 0.87949 V E 0.93150 H E 0.92851 T E 0.91452 T E 0.85453 I E 0.72954 A C 0.66355 G C 0.73956 T E 0.60957 R E 0.86758 I E 0.92959 I E 0.94860 G E 0.94761 R E 0.94362 M E 0.92063 T E 0.83864 A E 0.54465 G C 0.82166 N C 0.88767 R C 0.83768 R C 0.61069 G E 0.60070 L E 0.76171 L E 0.78972 V E 0.62173 P C 0.65874 T C 0.76575 Q C 0.83876 T C 0.88577 T C 0.79678 D H 0.77679 Q H 0.82880 E H 0.85381 L H 0.89582 Q H 0.90683 H H 0.86884 L H 0.82185 R H 0.75986 N H 0.53687 S C 0.55788 L C 0.86189 P C 0.90390 D C 0.89491 S C 0.78292 V E 0.69093 K E 0.92394 I E 0.95695 Q E 0.95596 R E 0.88797 V E 0.63198 E C 0.45599 E C 0.472

100 R H 0.539101 L H 0.567102 S H 0.522103 A C 0.491104 L C 0.804105 G C 0.863106 N C 0.657107 V E 0.835108 I E 0.940109 C E 0.936110 C E 0.828111 N C 0.602112 D C 0.863113 Y C 0.767114 V E 0.736115 A E 0.889116 L E 0.914117 V E 0.830118 H C 0.498119 P C 0.749120 D C 0.871121 I C 0.882122 D C 0.872123 R H 0.728124 E H 0.807125 T H 0.832126 E H 0.872127 E H 0.879128 L H 0.874129 I H 0.824130 S H 0.801131 D H 0.554132 V C 0.645133 L C 0.736134 G C 0.638135 V E 0.841136 E E 0.935137 V E 0.943138 F E 0.917139 R E 0.736140 Q E 0.563141 T E 0.586142 I E 0.639143 S E 0.624144 G E 0.696145 N E 0.822146 I E 0.894147 L E 0.900148 V E 0.876149 G E 0.843150 S E 0.848151 Y E 0.911152 C E 0.938153 S E 0.934154 L E 0.898155 S E 0.686156 N C 0.740

157 Q C 0.892158 G C 0.855159 G E 0.526160 L E 0.789161 V E 0.821162 H E 0.523163 P C 0.772164 Q C 0.823165 T C 0.842166 S C 0.893167 V H 0.796168 Q H 0.818169 D H 0.829170 Q H 0.887171 E H 0.897172 E H 0.915173 L H 0.849174 S H 0.831175 S H 0.755176 L H 0.473177 L C 0.575178 Q C 0.631179 V E 0.620180 P E 0.839181 L E 0.907182 V E 0.891183 A E 0.716184 G C 0.486185 T E 0.556186 V E 0.674187 N E 0.591188 R C 0.595189 G C 0.722190 S C 0.531191 S E 0.701192 V E 0.848193 V E 0.835194 G E 0.664195 A E 0.569196 G E 0.634197 M E 0.750198 V E 0.846199 V E 0.825200 N C 0.547201 D C 0.805202 Y C 0.699203 L E 0.647204 A E 0.826205 V E 0.890206 T E 0.803207 G E 0.520208 L C 0.677209 D C 0.830210 T C 0.876211 T C 0.898212 A C 0.855213 P C 0.612

Page 43: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

214 E H 0.446215 L H 0.569216 S H 0.538217 V H 0.538

218 I H 0.499219 E E 0.528220 S E 0.460221 I E 0.470

222 F E 0.452223 R C 0.698224 L C 0.938

Procaspase-7 PredictionS C 0.964I C 0.931K C 0.889T C 0.866T C 0.875R C 0.873D C 0.894R C 0.865V C 0.820P C 0.634T H 0.447Y H 0.435Q E 0.451Y E 0.481N H 0.372M C 0.584N C 0.761F C 0.816E C 0.841K C 0.855L C 0.798G E 0.551K E 0.869C E 0.947I E 0.952I E 0.940I E 0.864N E 0.582N C 0.679K C 0.746N C 0.682F C 0.646D C 0.678K C 0.639V C 0.645T C 0.784G C 0.861M C 0.883G C 0.825V C 0.737R C 0.746N C 0.824G C 0.851T C 0.836D C 0.770K C 0.725D C 0.705A H 0.859E H 0.924

A H 0.938L H 0.957F H 0.964K H 0.967C H 0.951F H 0.938R H 0.897S H 0.726L C 0.688G C 0.861F C 0.745D E 0.702V E 0.914I E 0.927V E 0.829Y C 0.486N C 0.725D C 0.860C C 0.879S C 0.866C H 0.862A H 0.909K H 0.941M H 0.958Q H 0.964D H 0.966L H 0.959L H 0.950K H 0.936K H 0.907A H 0.822S H 0.620E C 0.511E H 0.513D C 0.642H C 0.828T C 0.881N C 0.897A C 0.885A C 0.806C E 0.607F E 0.882A E 0.940C E 0.950I E 0.938L E 0.909L E 0.818S E 0.541H C 0.757

G C 0.870E C 0.873E C 0.752N C 0.492V E 0.663I E 0.800Y E 0.740G C 0.495K C 0.809D C 0.855G C 0.832V C 0.774T C 0.671P C 0.694I H 0.633K H 0.763D H 0.793L H 0.883T H 0.888A H 0.840H H 0.716F C 0.487R C 0.792G C 0.844D C 0.826R C 0.800C C 0.784K C 0.513T C 0.482L C 0.514L C 0.641E C 0.819K C 0.897P C 0.890K C 0.590L E 0.888F E 0.944F E 0.954I E 0.944Q E 0.894A E 0.704A C 0.539R C 0.764G C 0.849T C 0.857E C 0.882L C 0.881D C 0.870D C 0.827

Page 44: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

G C 0.690I C 0.510Q E 0.570A E 0.501D C 0.658S C 0.839G C 0.896P C 0.898I C 0.906N C 0.916D C 0.889T C 0.862D C 0.834A C 0.853N C 0.871P C 0.841R C 0.758Y C 0.690K C 0.669I C 0.667P C 0.716V C 0.759E C 0.764A C 0.712D C 0.537F E 0.800L E 0.903F E 0.916A E 0.902Y E 0.840S E 0.663T C 0.589V C 0.813P C 0.823G C 0.639Y E 0.614Y E 0.663S E 0.550W C 0.471R C 0.548S C 0.699P C 0.819G C 0.882R C 0.886G C 0.753S E 0.565W E 0.792F E 0.744V H 0.526Q H 0.862A H 0.919L H 0.940C H 0.952S H 0.958I H 0.952L H 0.925E H 0.880

E H 0.756H C 0.514G C 0.775K C 0.873D C 0.813L C 0.691E C 0.649I H 0.859M H 0.924Q H 0.947I H 0.955L H 0.958T H 0.957R H 0.937V H 0.867N H 0.721D H 0.672R H 0.676V H 0.617A H 0.515R C 0.395H C 0.518F C 0.584E C 0.673S C 0.787Q C 0.862S C 0.885D C 0.898D C 0.884P C 0.874H C 0.799F C 0.581H C 0.563E C 0.589K C 0.610K C 0.669Q C 0.784I C 0.854P C 0.813C C 0.596V H 0.496V H 0.520S H 0.553M H 0.489L H 0.460T H 0.410K H 0.462E H 0.520L H 0.490Y H 0.453F H 0.378S C 0.607Q C 0.664L C 0.673E C 0.708H C 0.783H C 0.811

H C 0.801H C 0.833H C 0.891H C 0.95

Page 45: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

PSIPRED Predictions

Appendix E: PSIPRED Predictions S. cerevisiae eIF6 GFP

Whale Myoglobin Poliovirus 3C proteinase

Page 46: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

PSIPRED Predictions

Procaspase7 Hepatitis C RNA Polymerase

Procathepsin L

Page 47: A Poor Mans EVA: Examining the Accuracy of Secondary ... 2002/Cevallos.pdfRandal C. Cevallos Bioc 218 Final Project A Poor Mans EVA: Examining the Accuracy of Secondary Structure Prediction

PSIPRED Predictions

RNase Hi RNase Sa

Human Trypsin IV