monday, october 18, 1:43:47 pm outline for today i.gene prediction: what are genes? where are genes?...

22
Monday, October 18, 1:43:47 PM Outline for today I. Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs. eukaryotic gene models Introns/exons Splicing Alternative splicing Genes-in-genes, genes-ad-genes Multi-subunit proteins II. Gene identification Homology-based gene prediction Similarity Searches (e.g. BLAST, BLAT) Genome Browsers RNA evidence (ESTs) Ab initio gene prediction Gene prediction programs: prokaryotes, eukaryotes Promoter prediction PolyA-signal prediction Splice site, start/stop-codon predictions Lec 06 Slide 132

Upload: rosaline-mclaughlin

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Outline for today

I. Gene Prediction: What are genes? Where are genes? Why do we care about a definition?

Prokaryotic vs. eukaryotic gene models Introns/exons Splicing Alternative splicing Genes-in-genes, genes-ad-genes Multi-subunit proteins

II. Gene identification• Homology-based gene prediction

Similarity Searches (e.g. BLAST, BLAT) Genome Browsers RNA evidence (ESTs)

• Ab initio gene prediction Gene prediction programs: prokaryotes, eukaryotes Promoter prediction PolyA-signal prediction Splice site, start/stop-codon predictions

Lec 06

Slide 132

Page 2: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Alternative splicing

Alternative splicing can be either constitutive or regulated• Constitutive alternative splicing: more than one product is always

made from a pre-mRNA.• Regulative alternative splicing: different forms of mRNA are

produced at different time, under different conditions, or in different cell or tissue types.

Alternative splicing is regulated by activators and repressors.• The regulating sequences : exonic or intronic; splicing enhancers

(ESE or ISE) or silencers (ESS and ISS). The former enhance and the latter repress splicing.

Proteins that regulate splicing bind to these specific sites for their action.

Lec 06

Slide 133Mo Chen & James L. Manley (2009): Nature Reviews Molecular Cell Biology 10, 741-754.

Page 3: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Lec 06

Slide 134

Alternative splicing

Alternative splicing can generate tens of thousands of mRNAs from a single primary transcript. Alternative splicing generates segments of mRNA variability that can insert or remove amino acids, shift the reading frame, or introduce a termination codon.

The typical human gene contains an average of 8 exons.

Up to 59% of human genes generate multiple mRNAs by alternative splicing and ∼80% of alternative splicing results in changes in the encoded protein.

A large fraction of alternative splicing undergoes cell specific regulation in which splicing pathways are modulated according to cell type, developmental stage, gender, or in response to external stimuli.

1 2 3 5Heart muscle mRNA

1 43 5Uterine muscle mRNA

3 5421Pre-mRNA

Page 4: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Alternative splicing is the process where one gene produces more than one type of mRNA.

Environment, DevelopmentDNA

mRNA

80% 20%Cell type 1

10% 90%Cell type 2

absent 100%Cell type 3

Alternative splicing Lec 06

Slide 135

The phenotype is determined by the proteome & transcriptome. Selection acts on the phenotype, and is blind to the genotype. Therefore: two species/individuals that have different forms of a protein

will be selected differently - even if the genes DNA sequence is identical.

Page 5: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Alternative splicing can generate mRNAs encoding proteins with different, even opposite functions.

Alternative splicing Lec 06

Therefore, understanding the mechanism of RNA splicing in normal cells and how it is regulated in different tissues and at different stages of development of an organism is essential in order to develop strategies to correct aberrant splicing in human pathologies.

Slide 136

Fas pre-mRNA

Fas

Fas ligand

(membrane-associated)

765

75

65 7 APOPTOSIS

Fas ligand

Soluble Fas(membrane)

(+)

(-)

Intron 1 Intron 2

Alternative splicing of the fas apoptosis receptor

Page 6: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Pathologies resulting from aberrant splicing can be grouped in two major categories

Mutations affecting a specific messenger RNA and disturbing its normal splicing pattern.

Examples:• ß-Thalassemia• Duchenne Muscular

Dystrophy• Cystic Fibrosis• Frasier Syndrome• Frontotemporal Dementia

and Parkinsonism

Mutations affecting proteins that are involved in splicing.

Examples:• Spinal Muscular Atrophy• Retinitis Pigmentosa• Myotonic Dystrophy

Lec 06

Slide 137

Page 7: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Splice variant detection Lec 06

1 2A 3

1 3

1 32B

PCR method: simple, sensitive, with std curve enough accurate, however, only internal changes are detectable and can’t scaled up.

Capture probe: very sensitive and accurate, complicated probe design, expensive.

Microarray method: can be scaled up to an entire genome (high throughput), so any types of splice variants are detectable, but not very accurate, complex and expensive.

Slide 138

Page 8: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Outline for today

I. Gene Prediction: What are genes? Where are genes? Why do we care about a definition?

Prokaryotic vs. eukaryotic gene models Introns/exons Splicing Alternative splicing Genes-in-genes, genes-ad-genes Multi-subunit proteins

II. Gene identification• Homology-based gene prediction

Similarity Searches (e.g. BLAST, BLAT) Genome Browsers RNA evidence (ESTs)

• Ab initio gene prediction Gene prediction programs: prokaryotes, eukaryotes Promoter prediction PolyA-signal prediction Splice site, start/stop-codon predictions

Lec 06

Slide 139

Page 9: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Bidirectional and partially overlapping genes Lec 06

Not very common in human genome.

Provides possibility for common regulation of a gene pair.

Partially overlapping genes are usually encoded by opposite DNA

strands.

Found in dense gene areas, as HLA class III complex on 6p21.3.

Could represent sense-antisense pair with one gene is coding mRNA,

another is non-coding.

Slide 140

Page 10: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Lec 06

Slide 141

Nested intronic genes

Genes within genes

Neurofibromatosis gene (NF1)

• OGMP-Oligodendrocyte myelin glycoprotein

• EVI2A and EVO2B homologues of ecotropic viral intergration sites in mouse.

• Two overlapping genes encoded by same strand of mt DNA (unique example).

• Two independent AUG located in frame-shift to each other, second stop codon is derived from TA + A from poly-A.

Page 11: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

When we BLAST a sequence is that comparative genomics?

• Entire genome compared to other entire genomes.

• Use information from many genomes to learn more about the individual genes.

What are some questions that comparative genomics can address?

• How has the organism evolved?

• What differentiates species?

• Which genes are required for organisms to survive in a certain environment?

• Which non-coding regions are important?

Lec 06

Comparative Genomics

Gene prediction

Slide 142

Page 12: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Gene prediction through comparative genomics

Highly similar (conserved) regions between two genomes are useful or else they would have diverged.

If genomes are too closely related all regions are similar, not just genes.

If genomes are too far apart, analogous regions may be too dissimilar to be found.

Lec 06

Different questions require different comparisons

Slide 143

Page 13: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Prokaryotes gene prediction Lec 06

NCBI ORF finder http://www.ncbi.nlm.nih.gov/gorf/gorf.html• ORF Finder - identifies all

possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons.

• The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin.

• Based on NCBI ORF finder 90 ORFs were identified in the Contig3 (28715 bp).

• This method is still not proper way for gene identification!

Slide 144

Page 14: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Gene calling anomalies• Short genes: a gene is called 'short' when it has been truncated

significantly at the 5'-end. Such genes are significantly shorter than their homologs in other species. Often this truncation causes the loss of important functional domains, resulting in theoretical loss of function of the gene.

Prokaryotes gene prediction Lec 06

Slide 145

Page 15: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Prokaryotes gene prediction Lec 06

Gene calling anomalies• Long genes: a gene is called 'long' when it has been extended at the

5'-end. Such genes are significantly longer than their homolog's in other species. A long gene can create overlaps with neighbouring features, with the result being that neighbouring genes are called short or features in the flanking intergenic regions are missed.

Slide 146

Page 16: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Prokaryotes gene prediction Lec 06

Gene calling anomalies• Unique gene: a gene is called 'unique' when it has no known

homolog's in other species. For such genes, Blast comparisons at the amino acid level with genes in other organisms return no hits. Often, such a gene call is an anomaly which, in turn, causes other anomalies, e.g. neighbouring genes called short.

DdesDRAFT_0263 is a unique gene. If DdesDRAFT_0264 were detected as a short gene, DdesDRAFT_0263 would actually be responsible for this short call.

Slide 147

• Dubious (uncertain) gene: a gene called as unique that is too short to be a functional gene is classified as 'dubious.' In actual practice, very few (1-10) dubious genes are found in the gene calls. When present, both unique and dubious genes are included when searching intergenic regions for missed genes.

Page 17: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Prokaryotes gene prediction Lec 06

Slide 148

Gene calling anomalies• Split genes interrupted by frame shifts and stop codons: a reported

split gene could be a good gene that is interrupted by frame-shifts or stop codons. Such a gene is called as a series of consecutive smaller genes, all of which have many blast hits in common.

Split genes DdesDRAFT_1032 and DdesDRAFT_1033 interrupted by a frame-shift.

Page 18: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Prokaryotes gene prediction Lec 06

Slide 149

Gene calling anomalies• Missed genes: gene prediction programs often miss genes however,

an alignment of this region indicates the presence of a perfectly good gene.

No genes had been predicted in the region between DdesDRAFt_0231 and DdesDRAFT_0232.

However, an alignment of this region indicates the presence of a perfectly good gene.

Page 19: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Free gene prediction software Lec 06

Slide 150

Based on GeneMark gene prediction software 14 genes were predicted in the Contig3 (28715 bp).

GeneMark: Georgia Institute of Technology, Atlanta, Georgia, USA. http://exon.biology.gatech.edu

Page 20: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Free gene prediction software Lec 06

Slide 151

Softberry: (FGENESB) Bacterial Operon and Gene Prediction. http://linux1.softberry.com/berry.phtml

Based on Softberry gene prediction software 6 genes were predicted in the Contig3 (28715 bp).

Page 21: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

EasyGene: gene finding in prokaryotes (1.2b Server). http://www.cbs.dtu.dk/services/EasyGene/

Free gene prediction software Lec 06

Slide 152

Based on EasyGene 1.2b Server 6 genes were predicted in the Contig3 (28715 bp).

Page 22: Monday, October 18, 1:43:47 PM Outline for today I.Gene Prediction: What are genes? Where are genes? Why do we care about a definition? Prokaryotic vs

Monday, October 18, 1:43:47 PM

Glimmer: NCBI Microbial Genome Annotation Tools. http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi

Free gene prediction software Lec 06

Slide 153

Based on Glimmer 81 genes were predicted in the Contig3 (28715 bp).