genome analysis2

Post on 16-Jul-2015

111 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Gene identification

Open Reading frame

Six ORFs of dsDNA

Six ORFs of dsDNA

Complication with Introns

Genome annotation

•Annotation : Obtaining biological information from unprocessed sequence data

•Structural annotation : Identification of genes and other other important sequence elements

•Functional annotation : The determination of the functional roles of genes in the organism

Genome annotation

•Raw genomic sequence can be annotated by,

i. Comparison with databases of previously cloned genes and ESTs

ii. Gene prediction based on consensus features such as Promoters Splice sites Polyadenylation sites and ORFs

Gene identificationGene finding in eukaryotes is difficultGenome GenesBacterial genome 80-85%Yeast 70% Fruit fly 25%Human genome 3-5%

In human genome, Typical exon = 150bp Intron = Several kbs Complete gene = Hundreds of kbs

ORF prediction•Three reading frames are possible from each strand of a DNA using “six-frame translation process” - Result is 6 potential protein sequences - Longest frame uninterrupted by a stop codon is the correct one

•Finding the ends of ORF is easier than finding beginning Beginning can be find using, - Start codon - kozak sequence (CCGCCAUGG) flanking start codon - CpG islands

Software programs for gene identification

•Advantage : Speed – annotation can be carried out concurrently with sequencing itself.

•Disadvantage : Accuracy

•Two strategies used are, - Homology searching - ab initio prediction

ab initio prediction

Based on type of algorithm,GRAIL – Based on neural networks - Predicts exons, genes, promoters, polyAs, CpG islands EST similarities, repetitive elements,

GeneFinder – Rule-based system

GENSCAN, GENEI, HMMGene, GeneMarkHMM, FGENEH – Hidden Markov model

Genescan

ab initio prediction

1. Feature dependent methods, Features of eukaryotic genes recognized are, -Control signals such as TATA box, cap site, Kozak consensus and polyadenylation sites

HEXON, MZEF are gene predicting programs that can predict only a single feature, exon.

2. Few programs depend on differences in base composition

ab initio predictionAccuracy problem – Algorithms are not 100% accurate

Errors include - Incorrect calling of exon boundaries - Missed exons - Failure to detect entire genes

Solution:Running different programs on single genome

Homology searching•Finding genes in long sequences by looking for matches with sequences that are known to be transcribed, e.g. cDNA, EST or a gene

Programs used are BLAST (Basic Local Alignment Search Tool)based, BLASTN BLASTX BLASTP etc.

Homology searching or ab initio ?

•Algorithms that take similarity data into account are better at gene prediction – Reese et al(2000), Fortna et al(2001)

Latest gene prediction algorithms combine similarity data with ab initio methods examples : Grail/Exp, GenieEST, GenomeScan

tRNAScanSE : For tRNA identification

Advanced gene finding programs

GLIMMER•Gene Locator and Interpolated Markov ModelER•For finding genes in microbial DNA

GLIMMER

GLIMMER

GeneMark

GeneMark

GenScan

top related