operon prediction

24
Operon Prediction Cao Fan

Upload: fausto

Post on 24-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Operon Prediction. Cao Fan. Operon. A functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter Exists primarily in prokaryotes, also found in eukaryotes. Operon. Approaches- wet lab. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Operon  Prediction

Operon Prediction

Cao Fan

Page 2: Operon  Prediction

Operon

• A functioning unit of genomic material containing a cluster of genes under the control of a single regulatory signal or promoter

• Exists primarily in prokaryotes, also found in eukaryotes

Page 3: Operon  Prediction

Operon

Page 4: Operon  Prediction

Approaches- wet lab

• Demonstrate co-transcription of the candidate gene cluster via RT-PCR of whole cell RNA

• Reverse transcribe a specific RNA into a cDNA using a gene specific primer

• Amplify the cDNA via PRC using primers designed from genes within the gene cluster

• Successful PCR amplification signals the genes are members of an operon

Maritza Guacucano, Gloria Levican, David S. Holmes, Eugenia Jedlicki. An RT-PCR artifact in the characterization of bacterial operons. http://www.ejbiotechnology.info/content/vol3/issue3/full/5/index.html

Page 5: Operon  Prediction

Approaches – dry lab

Features used:• Intergenic distance (IG)• Conserved gene clusters (CG)• Functional relations (FR)• Experimental evidence (EE)• Sequence based features (SF)• Phylogenetic profiles(PP)

Page 6: Operon  Prediction

Intergenic distance

• IG(contiguous genes, same operon) < IG(contiguous genes, different operons)

• The most widely used parameter for operon prediction

• Best single predictor

Page 7: Operon  Prediction

Conserved gene clusters

• Genes in an operon tend to be preserved across phylogenetically related organisms

• Order of genes in an operon may not be conserved

• Sequence comparison between non-redundant genomes is usually performed to identify conserved clusters

Page 8: Operon  Prediction

Functional relations

• Genes in the same operon tend to encode functionally related proteins

• E.g. members of the same protein complex, enzymes part of a single metabolic pathway

Page 9: Operon  Prediction

Functional relations

Functional classifications:• Riley’s functional annotation• Metabolic pathways• Clusters of orthologous groups of proteins

(COG)• Gene ontologies (GO)

Page 10: Operon  Prediction

Sequence-based features

• Overrepresented sequence motifs and other sequence elements such as promoters, terminators are used

• Gene length ratio is also used. The ratio is shown to be genome specific

Page 11: Operon  Prediction

Phylogenetic profiles

• Indicate a general trend for a set of genes to be simultaneously present or absent in related organisms

• PP is shown to be genome specific

Page 12: Operon  Prediction

FeaturesIG only

IG, SF, EECG only

Rutger W.W. Brouwer, Oscar P.Kuipers and Sacha A.F.T. van Hijum. The relative value of operon predictions. Briefings in Bioinformatics 2008

SF

Page 13: Operon  Prediction

Features

Page 14: Operon  Prediction

Using both genome-specific and general genomic information

• Phuongan Dam, Victor Olman, Kyle Harris, Zhengchang Su and Ying Xu

• Features used:– Intergenic distance– Neighborhood conservation– Phylogenetic distance– Short DNA motifs– Similarity score between GO terms– Length ratio

Page 15: Operon  Prediction

Prediction of operons in microbial genomes

• by Maria D. Ermolaeva, Owen White and Steven L. Salzberg

• Features:– Conserved gene clusters

• Scoring method:– Log-likely scores

Page 16: Operon  Prediction

Prediction of operons in microbial genomes

• Gene pair: two adjacent genes separated by ≤200 bp

• Conserved gene pair: two adjacent genes (A,B) for which a homologous gene pair (A’,B’) can be found in another genome.

• Similarity(A,B) < Similarity(B,B’) and Similarity(A,B) < Similarity(A,A’)• Use BLASTP to find homologs

Page 17: Operon  Prediction

Prediction of operons in microbial genomes

• S pair: genes in the pair on the same strand• D pair: genes in the pair on different strands• SO pair: gene pair belong to the same operon• SN pair: gene pair belong to different operons• Directon: a maximal set of adjacent genes

located on the same DNA strand

Page 18: Operon  Prediction

Prediction of operons in microbial genomes

• Probability of a conserved S pair being an SO pair:

P = 1 – P[SN|(conserved, S)] - Pchance • P[SN|(conserved,S)] =

= =

Page 19: Operon  Prediction

Prediction of operons in microbial genomes

Calculate P(SN|S):• Assumption: orientation of operons is random• N(operons) = 2N(directons)• N(SN pairs) = N(operons) – N(adjacent, non-pairs) – N(D pairs)

= 2N(directons) – (N(genes) – N(pairs)) – N(D pairs)= 2N(directons) + N(S pairs) – N(genes)

• P(SN|S) = N(SN pairs) / N(S pairs)

Page 20: Operon  Prediction

Prediction of operons in microbial genomes

Calculating Pchance:Pchance = (0.1G/N(conserved S))h

G is the number of genomes searched, h is the number of genomes where homologs for a given gene is found

Page 21: Operon  Prediction

Prediction of operons in microbial genomes

Result: 7699 gene pairs in 34 bacterial genomes

with genes belonging to the same operon with probability >= 0.98

Sensitivity: 30% - 50%

Page 22: Operon  Prediction

OperonDB

• Gene pair: co-linear, maybe separated by other genes with the same orientation

• Modified probability estimation with integration of intergenic distances:

P = 1 – P(SN|(conserved, S))* - Pchance

where P(l|D) and P(l|S) define the probabilities for a given S or D pair to have intergenic distance l.

Page 23: Operon  Prediction

OperonDB

Result:• Sensitivity > 60%• Maximum accuracy: 80%

Page 24: Operon  Prediction

Relation to UROP