genome evolution: a sequence-centric approach lecture 7: brief evolutionary history of everything
Post on 21-Dec-2015
225 views
TRANSCRIPT
Probabilistic models
Inference
Parameter estimation
Genome structure
Mutations
Population
Inferring Selection
(Probability, Calculus/Matrix theory, some graph theory, some statistics)
Simple Tree ModelsHMMs and variantsPhyloHMM,DBNContext-aware MMFactor Graphs
DPSamplingVariational apx.LBP
EMGeneralized EM (optimize free energy)
Diversity: Brief description of the tree of life
Genome structure: Size, Key features, Mobile elements
Genome information: Proteins/RNA genes, regulatory elements
Today: A lot of terminology, basic overview
RNABased
Genomes
RibosomeProteins
Genetic Code
DNABased
Genomes
Membranes Diversity!
? ?
3.4 – 3.8 BYA – fossils??3.2 BYA – good fossils
3 BYA – metanogenesis2.8 BYA – photosynthesis....1.7-1.5 BYA – eukaryotes..0.55 BYA – camberian explosion 0.44 BYA – jawed vertebrates0.4 – land plants0.14 – flowering plants0.10 - mammals
Curated set of universal proteins
Eliminating Lateral transfer
Multiple alignment and removal of bad domains
Maximum likelihood inference, with 4 classes of rate and a fixed matrix
BootstrapValidation
Ciccarelli et al 2005
EUKARYOTESPROKARYOTES
Presence of a nuclear membrane(Also present in the Planktomycetes)
Organelles derived from endosymbionts(also in b-protebacteria)
Cytoskeleton and vesicle transportTubulin-related protein, no microtubules
Trans-splicing-
Introns in protein coding genes, spliceosomeRare – almost never in coding
Expansion of untranslated regions of transcriptsShort UTRs
Translation initiation by scanning for startRibosome binds directly to a Shine-Delgrano sequence
mRNA surveillanceNonsense mediated decay pathway is absent
Multiple linear chromosomes, telomeresSingle linear chromosomes in a few eubacteria
Mitosis, MeiosisAbsent
Gene number expansion-
Expansion of cell sizeSome exceptions, but cells are small
Eukaryotes
Uniknots – one flagela at some developmental stage
FungiAnimalsAnimal parasitesAmoebas
Biknots – ancestrally two flagellas
Green plantsRed algeaCiliates, plasmoudiumBrown algeaMore amobea
Strange biology!
A big bang phylogeny: speciations across a short time span? Ambiguity – and not much hope for really resolving it
Ma
rmo
se
t
Ma
ca
qu
e
Ora
ng
uta
n
Ch
imp
Hu
ma
n
Bab
oo
n
Gib
bo
n
Gor
illa
0.5%0.5%
0.8%
1.5%
3%
9%
1.2%
Primates
Why larger genomes?
• Selflish DNA – – larger genomes are a result of the proliferation of selfish DNA– Proliferation stops only when it is becoming too deleterious
• Bulk DNA– Genome content is a consequence of natural selection– Larger genome is needed to allow larger cell size, larger nuclear
membrane etc.
Why smaller genomes?
• Metabolic cost: maybe cells lose excess DNA for energetic efficiency– But DNA is only 2-5% of the dry mass– No genome size – replication time correlation in prokaryotes– Replication is much faster than transcription (10-20 times in E. coli)
Mutational balance
• Balance between deletions and insertions– May be different between species– Different balances may have been evolved
• In flies, yeast laboratory evolution– 4-fold more 4kb spontaneous insertions
• In mammals – More small deletions than insertions
Mutational hazard• No loss of function for inert DNA
– But is it truly not functional?
• Gain of function mutations are still possible:– Transcription– Regulation
Differences in population size may make DNA purging more effective for prokaryotes, small eukaryotes
Differences in regulatory sophistication may make DNA mutational hazard less of a problem for metazoan
Can we model genome size evolution in a quantitative way?
Genome Structural features: centromeres/telomeres
Rat – Partly acrocentricHuman
Centromeres are essential and universally important for proper cell division, but are highly diverging among species
Sattelites and repeatsPericentromeric regions – more repeats
Telomeres are critical for genome maintenanceSub telomeric regions – also repetitiveMay be key to nuclear structure?
Genome Structural features: nuclear organization
The nucleus must be organized to allow functional transcription and replication
Incredibly dense mesh of chromosomes, cytoskeleton, membranes
Transcription factories / chromosomal territories“spacer DNA” may affect physical organization in unexpected ways
Inter- and Intra- chromosomal interactionsEntire genome may participate in regulating interactions
Modeling protein coding genes
Modeling protein structure/function
Structure is complex
Dependencies are not confined by gene linear
coding
http://predictioncenter.org/
Genome information: RNA genesmRNA – messenger RNA. Mature gene transcripts after introns have been processed out of the mRNA precursor
miRNA – micro-RNA. 20-30bp in length, processed from transcribed “hair-pin” precursors RNAs. Regulate gene expression by binding nearly perfect matches in the 3’ UTR of transcripts
siRNA – small interfering RNAs. 20-30bp in length, processed from double stranded RNA by the RNAi machinary. Used for posttranscriptional silencing
rRNA – ribosomal RNA, part of the ribosome machine (with proteins)
snRNA – small nuclear RNAs. Heterogeneous set with function confined to the nucleus. Including RNAs involved in the Splicesome machinery.
snoRNA – small nucleolar RNA. Involved in the chemical modifications made in the construction of ribosomes. Often encode within the introns of ribosomal proteins genes
tRNA – transfer RNA. Delivering amino-acid to the ribosome.
piRNA - ???
Genome information: regulatory elements
Computational perspective: finding and understanding TFBSs
Specialized proteins can bind DNA in a sequence specific fashion
Genomes can therefore control the level of affinity of each region to a large set of DNA binding proteins
DNA binding sites are typically short (<20bp)
Multiple binding sites at different affinities participate in regulation
The regulatory process is likely to less deterministicand discrete the this beautiful idealized sea urchinregulatory network
Each regulatory interaction is parameterized and many additional weak interaction participate in the Process
Evolution of regulatory regions involve more than a small set of discrete 20bp sites
Structure meets information: packaging and chromosomal interactions are critical for proper genome function
Structure meets information: HOX clusters as an example
Hox genes are important developmental regulators
Present in linear clusters, preserving order
Their expression is frequently coordinate with the gene order
4 HOX clusters are present in the human genome
Additional gene clusters: Protocadherins, Olfactory receptors, MAGE genes, Zinc fingers
Additional smaller groups of related regulators are co-located
Repeats: selfish DNA
ClassCopiesGenome Fraction
LINEs868,000(only ~100 active!!)
20.4%
SINEs1,558,000
(70% Alu)
13.1%
LTR elements443,0008.3%
Transposons294,0002.8%
Repetitive elements in the human genome
Repeats: short tandems, satellites
DNA-based transposons do not involve an RNA intermediate, and are quite rare.
Satellite DNA duplicate by Replication slippages which is enhanced for specific sequences. Abundant near telomeres and centromeres. Some of these are still a mystery.
Retrotransposition is generally sloppy and noisy – so elements die out quickly
Element proliferation appears in evolutionary bursts.
Pseudogenes
Genes that are becoming inactive due to mutations are called pseudogenes
mRNAs that jump back into the genome are called processed pseudogenes (they therefore lack introns)
Summary –
• History/Phylogeny:– Early phylogenetics can be inferred using genome sequences, but conclusions are not
always reliable– Maximum likelihood models sometime depends on the gene/genomic region analyzed,
genome is highly heterogeneous at all levels.– The major clades, phylogeny of model organisms and sequenced genomes
• Genome structure– Size and its consequences– Packaging and nuclear organization– Mutational effects and differences– Selfish DNA
• Genome information– Protein coding genes– RNA genes– Transcription factor binding sites– Chromosomal organization and DNA codes that affect it