genome organization & protein synthesis and processing in plants

46
Genome Organization & Protein Synthesis and Processing in Plants

Post on 19-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Genome Organization & Protein Synthesis and Processing in Plants

Genome Organization & Protein Synthesis and Processing in Plants

Page 2: Genome Organization & Protein Synthesis and Processing in Plants

Viral genomesViral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or ciruclar

Viruses with RNA genomes: •Almost all plant viruses and some bacterial and animal viruses•Genomes are rather small (a few thousand nucleotides)Viruses with DNA genomes (e.g. lambda = 48,502 bp):•Often a circular genome.Replicative form of viral genomes•all ssRNA viruses produce dsRNA molecules•many linear DNA molecules become circularMolecular weight and contour length: • duplex length per nucleotide = 3.4 Å• Mol. Weight per base pair = ~ 660

Page 3: Genome Organization & Protein Synthesis and Processing in Plants

Procaryotic genomes

• Generally 1 circular chromosome (dsDNA)

• Usually without introns• Relatively high gene density (~2500

genes per mm of E. coli DNA)• Contour length of E.coli genome: 1.7

mm• Often indigenous plasmids are present

Page 4: Genome Organization & Protein Synthesis and Processing in Plants

PlasmidsExtra chromosomal circular DNAs• Found in bacteria, yeast and other fungi• Size varies form ~ 3,000 bp to 100,000 bp.• Replicate autonomously (origin of replication)• May contain resistance genes• May be transferred from one bacterium to another• May be transferred across kingdoms• Multicopy plasmids (~ up to 400 plasmids/per cell)• Low copy plasmids (1 –2 copies per cell)• Plasmids may be incompatible with each other• Are used as vectors that could carry a foreign gene

of interest (e.g. insulin)

-lactamase

ori

foreign gene

Page 5: Genome Organization & Protein Synthesis and Processing in Plants

Eukaryotic genome

• Moderately repetitive– Functional (protein coding, tRNA coding)– Unknown function

• SINEs (short interspersed elements)– 200-300 bp

– 100,000 copies

• LINEs (long interspersed elements)– 1-5 kb

– 10-10,000 copies

Page 6: Genome Organization & Protein Synthesis and Processing in Plants

Eukaryotic genome

• Highly repetitive– Minisatellites

• Repeats of 14-500 bp• 1-5 kb long• Scattered throughout genome

– Microsatellites• Repeats up to 13 bp• 100s of kb long, 106 copies• Around centromere

– Telomeres• Short repeats (6 bp)• 250-1,000 at ends of chromosomes

Page 7: Genome Organization & Protein Synthesis and Processing in Plants

Eucaryotic genomes• Located on several chromosomes• Relatively low gene density (50 genes per mm

of DNA in humans)• Contour length of DNA from a single human

cell = 2 meters• Approximately 1011 cells = total length 2 x 1011

km• Distance between sun and earth (1.5 x 108 km)• Human chromosomes vary in length over a 25

fold range • Carry organelles genome as well

Page 8: Genome Organization & Protein Synthesis and Processing in Plants

Mitochondrial genome (mtDNA)

• Multiple identical circular chromosomes

• Size ~15 Kb in animals• Size ~ 200 kb to 2,500 kb in plants• Over 95% of mitochondrial proteins

are encoded in the nuclear genome.• Often A+T rich genomes. • Mt DNA is replicated before or during

mitosis

Page 9: Genome Organization & Protein Synthesis and Processing in Plants

Chloroplast genome (cpDNA)

• Multiple circular molecules • Size ranges from 120 kb to 160 kb• Similar to mtDNA• Many chloroplast proteins are

encoded in the nucleus (separate signal sequence)

Page 10: Genome Organization & Protein Synthesis and Processing in Plants

“Cellular” GenomesViruses Procaryotes Eucaryotes

Viral genome Bacterial chromosome

Plasmids

Chromosomes(Nuclear genome)

Mitochondrial genome

Chloroplast genome

Genome: all of an organism’s genes plus intergenic DNA Intergenic DNA = DNA between genes

Capsid

Nucleus

Page 11: Genome Organization & Protein Synthesis and Processing in Plants

Estimated genome sizes

1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 1e11 1e12

viruses (1024)

bacteria (>100)

fungi

mitochondria (~ 100)

plants

mammals

Size in nucleotides. Number in ( ) = completely sequenced genomes

Page 12: Genome Organization & Protein Synthesis and Processing in Plants

Size of genomes

Epstein-Barr virus 0.172 x 106

E. coli 4.6 x 106

S. cerevisiae 12.1 x 106

C. elegans 95.5 x 106

A. thaliana 117 x 106

D. melanogaster 180 x 106

H. sapiens 3200 x 106

Page 13: Genome Organization & Protein Synthesis and Processing in Plants

Chromosome organizationEucaryotic chromosome

Telomere TelomereCentromere

Centromere: • DNA sequence that serve as an attachment for protein during mitosis. • In yeast these sequences (~ 130 nts) are very A+T rich. • In higher eucaryotes centromers are much longer and contain “satellite DNA”

Telomeres:• At the end of chromosomes; help stabilize the chromosome• In yeast telomeres are ~ 100 bp long (imperfect repeats)• Repeats are added by a specific telomerase

p-arm q-arm

5’ – (TxGy)n3’ – (AxCy)n

x and y = 1 - 4n = 20 to 100; (1500 in mammals)

Page 14: Genome Organization & Protein Synthesis and Processing in Plants

Gene classification

coding genesnon-coding genes

Messenger RNA

Proteins

Structural RNA

Structural proteins Enzymes

transfer RNA

ribosomal RNA

otherRNA

Chromosome(simplified)

intergenic region

Page 15: Genome Organization & Protein Synthesis and Processing in Plants

What is a gene ?• Definitions

1. Classical definition: Portion of a DNA that determines a single character (phenotype)

2. One gene – one enzyme (Beadle & Tatum 1940): “Every gene encodes the information for one enzyme”

3. One gene – one protein: “One gene contains information for one protein (structural proteins included) one gene – one polypeptide

4. Current definition: A piece of DNA (or in some cases RNA) that contains the primary sequence to produce a functional biological gene product (RNA, protein).

Page 16: Genome Organization & Protein Synthesis and Processing in Plants

Coding region

Nucleotides (open reading frame) encoding the amino acid sequence of a protein

The molecular definition of gene includes more than just the coding region

Page 17: Genome Organization & Protein Synthesis and Processing in Plants

Noncoding regions

• Regulatory regions– RNA polymerase binding site– Transcription factor binding sites

• Introns

• Polyadenylation [poly(A)] sites

Page 18: Genome Organization & Protein Synthesis and Processing in Plants

Gene

Molecular definition:

Entire nucleic acid sequence necessary for the synthesis of a functional polypeptide (protein chain) or functional RNA

Page 19: Genome Organization & Protein Synthesis and Processing in Plants

Anatomy of a gene

• ORF. From start (ATG) to stop (TGA, TAA, TAG)

• Upstream region with binding site. (e.g. TATA box).

• Poly-a ‘tail’

• Splices. Bounded by AG and GT splice signals.

Page 20: Genome Organization & Protein Synthesis and Processing in Plants

Bacterial genes

• Most do not have introns

• Many are organized in operons: contiguous genes, transcribed as a single polycistronic mRNA, that encode proteins with related functions

Polycistronic mRNA encodes several proteins

Page 21: Genome Organization & Protein Synthesis and Processing in Plants

What would be the effect of a mutation in the control region (a) compared to a

mutation in a structural gene (b)?

Bacterial operon

Page 22: Genome Organization & Protein Synthesis and Processing in Plants

Eucaryotic genes

Exon 190 bp

Exon 2222 bp

Exon 3126 bp

Intron A131 bp

Intron B851 bp

Hemoglobin beta subunit gene

Introns: intervening sequences within a gene that are not translatedinto a protein sequence. Collagen has 50 introns.

Exons: sequences within a gene that encode protein sequencesSplicing: Removal of introns from the mRNA molecule.

Splicing

Page 23: Genome Organization & Protein Synthesis and Processing in Plants

Regulatory mechanisms

• ‘organize expression of genes’ (function calls)

• Promoter region (binding site), usually near coding region

• Binding can block (inhibit) expression• Computational challenges

– Identify binding sites– Correlate sequence to expression

Page 24: Genome Organization & Protein Synthesis and Processing in Plants

Eukaryotic genes

• Most have introns

• Produce monocistronic mRNA: only one encoded protein

• Large

Page 25: Genome Organization & Protein Synthesis and Processing in Plants

Alternative splicing

• Splicing is the removal of introns

• mRNA from some genes can be spliced into two or more different mRNAs

Page 26: Genome Organization & Protein Synthesis and Processing in Plants

“Nonfunctional” DNA

• Higher eukaryotes have a lot of noncoding DNA

• Some has no known structural or regulatory function (no genes)

80 kb

Page 27: Genome Organization & Protein Synthesis and Processing in Plants

Types of eukaryotic DNA

Page 28: Genome Organization & Protein Synthesis and Processing in Plants

Duplicated genes• Encode closely related (homologous)

proteins

• Clustered together in genome

• Formed by duplication of an ancestral gene followed by mutation

Five functional genes and two pseudogenes

Page 29: Genome Organization & Protein Synthesis and Processing in Plants

Pseudogenes

• Nonfunctional copies of genes

• Formed by duplication of ancestral gene, or reverse transcription (and integration)

• Not expressed due to mutations that produce a stop codon (nonsense or frameshift) or prevent mRNA processing, or due to lack of regulatory sequences

Page 30: Genome Organization & Protein Synthesis and Processing in Plants

Repetitive DNA• Moderately repeated DNA

– Tandemly repeated rRNA, tRNA and histone genes (gene products needed in high amounts)

– Large duplicated gene families– Mobile DNA

• Simple-sequence DNA– Tandemly repeated short sequences– Found in centromeres and telomeres (and others)– Used in DNA fingerprinting to identify

individuals

Page 31: Genome Organization & Protein Synthesis and Processing in Plants

Types of DNA repeats

Tandem repeats (e.g. satellite DNA)

Inverted repeats (e.g. in transposons)

5’-CATGTGCTGAAGGCTATGTGCTGCGACG- 3’3’-GTACACGACTTCCGATACACGACGCTGC- 5’

5’-CATGTGCTGAAGGCTCAGCACATCGACG- 3’3’-GTACACGACTTCCGAGTCGTGTAGCTGC- 5’ Stem

Loop

Palindroms = adjacent inverted repeats (e.g. restriction sites)• Form hairpin structures

• Form stem-loop structures

Hairpin

Perfect repeats vs degenerate repeats

Page 32: Genome Organization & Protein Synthesis and Processing in Plants

Repetitive sequencesChromosomal DNA

Satellite DNA

Caesium chloridedensity gradient

Type No. of Repeats

Size Percent of genome

Highly repetitive

> 1 Mill < 10 bp 10 %

Moderately repetitive

> 1000 ~ 150 - ~300 bp 20 %

Repeats in the mouse genome

Page 33: Genome Organization & Protein Synthesis and Processing in Plants

DNA repeats and forensics

878 bp556 bp

M F Suspect

Alu sequenceY

X

M F Suspect

528 bp199 bp

X-Y homologous regionsAluSTYa

AluSTXa

AluSTYa

Gender determination1) Standard technique: PCR

amplification of the amelogenin locus (Males = XY => 103 + 109 bp)

2) AluSTXa Alu insertion on X 3) AluSTYa Alu insertion on Y

Page 34: Genome Organization & Protein Synthesis and Processing in Plants

Mobile DNA

• Move within genomes

• Most of moderately repeated DNA sequences found throughout higher eukaryotic genomes– L1 LINE is ~5% of human DNA (~50,000 copies)– Alu is ~5% of human DNA (>500,000 copies)

• Some encode enzymes that catalyze movement

Page 35: Genome Organization & Protein Synthesis and Processing in Plants

Transposition

• Movement of mobile DNA

• Involves copying of mobile DNA element and insertion into new site in genome

Page 36: Genome Organization & Protein Synthesis and Processing in Plants

Why?

• Molecular parasite: “selfish DNA”

• Probably have significant effect on evolution by facilitating gene duplication, which provides the fuel for evolution, and exon shuffling

Page 37: Genome Organization & Protein Synthesis and Processing in Plants

RNA or DNA intermediate

• Transposon moves using DNA intermediate

• Retrotransposon moves using RNA intermediate

Page 38: Genome Organization & Protein Synthesis and Processing in Plants

Types of mobile DNA elements

Page 39: Genome Organization & Protein Synthesis and Processing in Plants

LTR (long terminal repeat)• Flank viral retrotransposons and retroviruses

• Contain regulatory sequencesTranscription start site and poly (A) site

Page 40: Genome Organization & Protein Synthesis and Processing in Plants
Page 41: Genome Organization & Protein Synthesis and Processing in Plants

LINES and SINES• Non-viral retro-transposons

– RNA intermediate– Lack LTR

• LINES (long interspersed elements)– ~6000 to 7000 base pairs– L1 LINE (~5% of human DNA)– Encode enzymes that catalyze movement

• SINES (short interspersed elements)– ~300 base pairs– Alu (~5% of human DNA)

Page 42: Genome Organization & Protein Synthesis and Processing in Plants

Proteins

• Most protein sequences (today) are inferred• What’s wrong with this?• Proteins (and nucleic acids) are modified• ‘mature’ Rna• Computational challenges

– Identify (possible) aspects of molecular life cycle

– Identify protein-protein and protein-nucleic acid interactions

Page 43: Genome Organization & Protein Synthesis and Processing in Plants

Genetic variation

• Variable number tandem repeats (minisatellites). 10-100 bp. Forensic applications.

• Short tandem repeat polymorphisms (microsatellites). 2-5 bp, 10-30 consecutive copies.

• Single nucleotide polymorphisms

Page 44: Genome Organization & Protein Synthesis and Processing in Plants

Single nucleotide polymorphisms

• 1/2000 bp.

• Types– Silent– Truncating – Shifting

• Significance: much of individual variation.

• Challenge: correlation to disease

Page 45: Genome Organization & Protein Synthesis and Processing in Plants

Yeast genome

• 4.6 x 106 bp. One chromosome. Published 1997.

• 4,285 protein-coding genes

• 122 structural RNA genes

• Repeats. Regulatory elements. Transposons.

• Lateral transfers.

Page 46: Genome Organization & Protein Synthesis and Processing in Plants

Yeast protein functionsRegulatory 45 1.05%

Cell structure 182 4.24

Transposons,etc 87 2.03

Transport & binding 281 6.55

Putative transport 146 3.40

Replication, repair 115 2.68

Transcription 55 1.28

Translation 182 4.24

Enzymes 251 5.85

Unknown 1632 38.06