![Page 1: Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics 236523 Winter 2010-2011 Roi Adadi roia@cs Naama Kraus nkraus@cs](https://reader036.vdocuments.us/reader036/viewer/2022082403/5697c0131a28abf838ccccd6/html5/thumbnails/1.jpg)
Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics 236523Winter 2010-2011
Roi Adadi roia@csNaama Kraus nkraus@cs
![Page 2: Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics 236523 Winter 2010-2011 Roi Adadi roia@cs Naama Kraus nkraus@cs](https://reader036.vdocuments.us/reader036/viewer/2022082403/5697c0131a28abf838ccccd6/html5/thumbnails/2.jpg)
Main QuestionPredict the function of
hypothetical proteins which are inferred by genome sequencing
Annotate proteins at one of possible three levels◦Function◦Biological process◦Cellular localization
![Page 3: Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics 236523 Winter 2010-2011 Roi Adadi roia@cs Naama Kraus nkraus@cs](https://reader036.vdocuments.us/reader036/viewer/2022082403/5697c0131a28abf838ccccd6/html5/thumbnails/3.jpg)
ProcessCluster the gene expressions using EPCLUST
Two possible directions:
Direction 1◦ Choose some "nice" cluster (e.g. a tied cluster)◦ Identify a common function F using GO◦ Search for hypothetical proteins in the cluster ◦ Predict their function as F◦ Validate the prediction using other methods
Use Blast to search for homologous proteins, do they contain F ?
Use Meme/Pfam to identify a common Motif/Domain, does it relate to F ?
![Page 4: Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics 236523 Winter 2010-2011 Roi Adadi roia@cs Naama Kraus nkraus@cs](https://reader036.vdocuments.us/reader036/viewer/2022082403/5697c0131a28abf838ccccd6/html5/thumbnails/4.jpg)
Process – cont’dDirection 2
◦Decide on some function of interest and search for a cluster where this function is common Identify a cluster with a significant
localization function Look for a significant motif/domain in the
mRNA UTRs of the sequences in the cluster using MEME/Pfam
Search the motif/domain in other proteins, do they localize at the same location ?