functional annotation of proteins via the cafa challenge lee tien duncan renfrow-symon shilpa...
TRANSCRIPT
![Page 1: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/1.jpg)
Functional Annotation of Proteins via the CAFA ChallengeLee TienDuncan Renfrow-SymonShilpa NadimpalliMengfei Cao
COMP150PBT | Fall 2010
![Page 2: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/2.jpg)
What’s the problem?1. Huge bottleneck = finding a protein’s
function when given a protein sequence
1. Incomplete, inaccurate, or inconsistent annotations are difficult to work with and can propagate
1. No good way to measure the accuracy of an annotation predictor
![Page 3: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/3.jpg)
What is the CAFA Challenge?
![Page 4: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/4.jpg)
What are Gene Ontology (GO) terms?•GO = controlled vocabulary of “gene
ontologies”
•Cover three domains:▫Cellular component▫Molecular function▫Biological process
•Hierarchy:▫Broad/general (e.g. “catalytic activity”)▫Specific (e.g. “leukotriene-C4-synthase
activity”)
![Page 5: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/5.jpg)
Outline of Our Approach
CAFA targets(FASTA
sequences)
GO ids for each CAFA
target
SMURF?
Betawrap Pro?
Other Secondary Structure Predictor?
BLAST
PFAM
![Page 6: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/6.jpg)
Pfam: Protein Family Database• Collection of protein
families represented by: ▫Multiple sequence
alignments▫Hidden Markov Models
• Two sections of Pfam:▫A: high-quality,
manually-curated▫B: large, automatically-
generated
Sample Multiple Sequence Alignment
Sample Hidden Markov Model
![Page 7: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/7.jpg)
BLAST: Basic Local Align’t Search Tool•Goal: find homologous (i.e. derived from a
common ancester) sequences from a database
•Various BLAST programs:▫blastp = query: protein, database: protein▫blastn = query: nucleotide, database:
nucleotide▫blastx = query: translated nucleotide,
database: protein▫tblastn = query: protein, database: translated
nucleotide▫tblastx = query: translated nucleotide,
database: translated nucleotide
![Page 8: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/8.jpg)
SMURF: Structural Motifs Using Random Fields
•Determines whether a protein sequence contains one of the following super secondary structures:▫6-bladed propeller▫7-bladed propeller▫8-bladed propeller▫Double blades (i.e. 6-6, 6-7,6-8…)
•Developed at Tufts!•Some propeller functions:
▫Often WD40 repeat –protein-protein interaction
▫Signaling, transcription, cell cycle
Smurf!
7-bladed propeller
![Page 9: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/9.jpg)
Final Database Structure
cafa_targets
cafa_id
uniprot_id
gi_access_idblast_results
cafa_id
pdb_id
refseq_id
e_value_score
pfam_results
cafa_id
pfam_id
smurf_results
cafa_id
template_id
p_value_score
pdb_id
go_id
refseq_id
uniprot_id
uniprot_id
go_id
pfam_id
go_id
template_id
go_idgo_results
cafa_id
go_id
source
confidence
INPUT RESULTS MAPPING OUTPUT
![Page 10: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/10.jpg)
Final Results Statistics
789
69
12
19
4
3,445
1,356
Distribution of sequence hits by method
Of 8,904 unknown sequences… 4,265 had at least one hit in PDB BLAST 4,824 had at least one hit in Pfam 104 had at least one hit in SMURF
In total, 5,694 unique sequences had at least one hit, a 63.9% success
![Page 11: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/11.jpg)
Example ResultT38114MDLDMNGGNKRVFQRLGGGSNRPTTDSNQKVCFHWRAGRCNRYPCPYLHRELPGPGSGPVAASSNKRVADESGFAGPSHR
RGPGFSGTANNWGRFGGNRTVTKTEKLCKFWVDGNCPYGDKCRYLHCWSKGDSFSLLTQLDGHQKVVTGIALPSGSDKLY
TASKDETVRIWDCASGQCTGVLNLGGEVGCIISEGPWLLVGMPNLVKAWNIQNNADLSLNGPVGQVYSLVVGTDLLFAGT
QDGSILVWRYNSTTSCFDPAASLLGHTLAVVSLYVGANRLYSGAMDNSIKVWSLDNLQCIQTLTEHTSVVMSLICWDQFL
LSCSLDNTVKIWAATEGGNLEVTYTHKEEYGVLALCGVHDAEAKPVLLCSCNDNSLHLYDLPSFTERGKILAKQEIRSIQ
IGPGGIFFTGDGSGQVKVWKWSTESTPILS
•BLAST: matches with PDB structures 2OVP, 3MKS, 2CNX, 1P22, 1NEX, 3N0E
▫Transcription, mitosis, methylation, protein binding
•Pfam: match to family PF00642▫Zinc ion binding, nucleic acid binding
•SMURF: match to 7-bladed β-propeller template
▫WD domain (protein binding)
![Page 12: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/12.jpg)
Possible Future Directions• Improving functional annotation for β-
propellers identified by SMURF▫Analyze training set of propeller proteins with
known function to build probabilistic model of protein function based on propeller type
•Addition of other structural prediction tools for motifs with known function▫G-coupled receptors, membrane bound proteins
•Expansion of BLAST search to include full nr database
![Page 13: Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010](https://reader030.vdocuments.us/reader030/viewer/2022032805/56649efa5503460f94c0ba17/html5/thumbnails/13.jpg)
Questions?