genome evolution: a sequence-centric approach

42
Genome evolution: a sequence-centric approach Lecture 7: Brief evolutionary history of everything

Upload: junius

Post on 05-Jan-2016

32 views

Category:

Documents


1 download

DESCRIPTION

Genome evolution: a sequence-centric approach. Lecture 7: Brief evolutionary history of everything. (Probability, Calculus/Matrix theory, some graph theory, some statistics). Simple Tree Models HMMs and variants PhyloHMM,DBN Context-aware MM Factor Graphs. Probabilistic models. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genome evolution:  a sequence-centric approach

Genome evolution: a sequence-centric approach

Lecture 7: Brief evolutionary history of everything

Page 2: Genome evolution:  a sequence-centric approach

Probabilistic models

Inference

Parameter estimation

Genome structure

Mutations

Population

Inferring Selection

(Probability, Calculus/Matrix theory, some graph theory, some statistics)

Simple Tree ModelsHMMs and variantsPhyloHMM,DBNContext-aware MMFactor Graphs

DPSamplingVariational apx.LBP

EMGeneralized EM (optimize free energy)

Page 3: Genome evolution:  a sequence-centric approach

Genome Structure, Genome Information

Genome structure Genomic informationSelection

Mutation

Page 4: Genome evolution:  a sequence-centric approach

Diversity: Brief description of the tree of life

Genome structure: Size, Key features, Mobile elements

Genome information: Proteins/RNA genes, regulatory elements

Today: A lot of terminology, basic overview

Page 5: Genome evolution:  a sequence-centric approach

RNABased

Genomes

RibosomeProteins

Genetic Code

DNABased

Genomes

Membranes Diversity!

? ?

3.4 – 3.8 BYA – fossils??3.2 BYA – good fossils

3 BYA – metanogenesis2.8 BYA – photosynthesis....1.7-1.5 BYA – eukaryotes..0.55 BYA – camberian explosion 0.44 BYA – jawed vertebrates0.4 – land plants0.14 – flowering plants0.10 - mammals

Page 6: Genome evolution:  a sequence-centric approach
Page 7: Genome evolution:  a sequence-centric approach

Curated set of universal proteins

Eliminating Lateral transfer

Multiple alignment and removal of bad domains

Maximum likelihood inference, with 4 classes of rate and a fixed matrix

BootstrapValidation

Ciccarelli et al 2005

Page 8: Genome evolution:  a sequence-centric approach

EUKARYOTESPROKARYOTES

Presence of a nuclear membrane(Also present in the Planktomycetes)

Organelles derived from endosymbionts(also in b-protebacteria)

Cytoskeleton and vesicle transportTubulin-related protein, no microtubules

Trans-splicing-

Introns in protein coding genes, spliceosomeRare – almost never in coding

Expansion of untranslated regions of transcriptsShort UTRs

Translation initiation by scanning for startRibosome binds directly to a Shine-Delgrano sequence

mRNA surveillanceNonsense mediated decay pathway is absent

Multiple linear chromosomes, telomeresSingle linear chromosomes in a few eubacteria

Mitosis, MeiosisAbsent

Gene number expansion-

Expansion of cell sizeSome exceptions, but cells are small

Page 9: Genome evolution:  a sequence-centric approach
Page 10: Genome evolution:  a sequence-centric approach

Biknots

Uniknots

Eukaryotes

Page 11: Genome evolution:  a sequence-centric approach

Eukaryotes

Uniknots – one flagela at some developmental stage

FungiAnimalsAnimal parasitesAmoebas

Biknots – ancestrally two flagellas

Green plantsRed algeaCiliates, plasmoudiumBrown algeaMore amobea

Strange biology!

A big bang phylogeny: speciations across a short time span? Ambiguity – and not much hope for really resolving it

Page 12: Genome evolution:  a sequence-centric approach

Vertebrates

Sequenced Genomes phylogeny

Fossil based, large scale phylogeny

Page 13: Genome evolution:  a sequence-centric approach

Ma

rmo

se

t

Ma

ca

qu

e

Ora

ng

uta

n

Ch

imp

Hu

ma

n

Bab

oo

n

Gib

bo

n

Gor

illa

0.5%0.5%

0.8%

1.5%

3%

9%

1.2%

Primates

Page 14: Genome evolution:  a sequence-centric approach

Flies

Page 15: Genome evolution:  a sequence-centric approach

Yeasts

Page 16: Genome evolution:  a sequence-centric approach

Genome Size

Page 17: Genome evolution:  a sequence-centric approach

Why larger genomes?

• Selflish DNA – – larger genomes are a result of the proliferation of selfish DNA– Proliferation stops only when it is becoming too deleterious

• Bulk DNA– Genome content is a consequence of natural selection– Larger genome is needed to allow larger cell size, larger nuclear

membrane etc.

Page 18: Genome evolution:  a sequence-centric approach

Why smaller genomes?

• Metabolic cost: maybe cells lose excess DNA for energetic efficiency– But DNA is only 2-5% of the dry mass– No genome size – replication time correlation in prokaryotes– Replication is much faster than transcription (10-20 times in E. coli)

Page 19: Genome evolution:  a sequence-centric approach

Mutational balance

• Balance between deletions and insertions– May be different between species– Different balances may have been evolved

• In flies, yeast laboratory evolution– 4-fold more 4kb spontaneous insertions

• In mammals – More small deletions than insertions

Mutational hazard• No loss of function for inert DNA

– But is it truly not functional?

• Gain of function mutations are still possible:– Transcription– Regulation

Differences in population size may make DNA purging more effective for prokaryotes, small eukaryotes

Differences in regulatory sophistication may make DNA mutational hazard less of a problem for metazoan

Can we model genome size evolution in a quantitative way?

Page 20: Genome evolution:  a sequence-centric approach

Genome Structural features: centromeres/telomeres

Rat – Partly acrocentricHuman

Centromeres are essential and universally important for proper cell division, but are highly diverging among species

Sattelites and repeatsPericentromeric regions – more repeats

Telomeres are critical for genome maintenanceSub telomeric regions – also repetitiveMay be key to nuclear structure?

Page 21: Genome evolution:  a sequence-centric approach

Genome Structural features: nuclear organization

The nucleus must be organized to allow functional transcription and replication

Incredibly dense mesh of chromosomes, cytoskeleton, membranes

Transcription factories / chromosomal territories“spacer DNA” may affect physical organization in unexpected ways

Inter- and Intra- chromosomal interactionsEntire genome may participate in regulating interactions

Page 22: Genome evolution:  a sequence-centric approach

Genomic information: Protein coding genes

Page 23: Genome evolution:  a sequence-centric approach

Modeling protein coding genes

Modeling protein structure/function

Structure is complex

Dependencies are not confined by gene linear

coding

http://predictioncenter.org/

Page 24: Genome evolution:  a sequence-centric approach

Genomic information: the gene repertoire is evolving by duplication and loss

Page 25: Genome evolution:  a sequence-centric approach

Genome information: Introns/Exons

Page 26: Genome evolution:  a sequence-centric approach

Genome information: RNA genesmRNA – messenger RNA. Mature gene transcripts after introns have been processed out of the mRNA precursor

miRNA – micro-RNA. 20-30bp in length, processed from transcribed “hair-pin” precursors RNAs. Regulate gene expression by binding nearly perfect matches in the 3’ UTR of transcripts

siRNA – small interfering RNAs. 20-30bp in length, processed from double stranded RNA by the RNAi machinary. Used for posttranscriptional silencing

rRNA – ribosomal RNA, part of the ribosome machine (with proteins)

snRNA – small nuclear RNAs. Heterogeneous set with function confined to the nucleus. Including RNAs involved in the Splicesome machinery.

snoRNA – small nucleolar RNA. Involved in the chemical modifications made in the construction of ribosomes. Often encode within the introns of ribosomal proteins genes

tRNA – transfer RNA. Delivering amino-acid to the ribosome.

piRNA - ???

Page 27: Genome evolution:  a sequence-centric approach

miRNA clusters

Page 28: Genome evolution:  a sequence-centric approach

snRNA works by binding other RNAs

RNA structure affects function

Page 29: Genome evolution:  a sequence-centric approach

Computational perspective: finding and understanding RNAs and their evolution

Page 30: Genome evolution:  a sequence-centric approach

Ultra-high throughput sequencing is transforming all aspects of biology

Page 31: Genome evolution:  a sequence-centric approach

Ultra-high throughput sequencing is transforming all aspects of biology

Page 32: Genome evolution:  a sequence-centric approach

Genome information: regulatory elements

Computational perspective: finding and understanding TFBSs

Specialized proteins can bind DNA in a sequence specific fashion

Genomes can therefore control the level of affinity of each region to a large set of DNA binding proteins

DNA binding sites are typically short (<20bp)

Multiple binding sites at different affinities participate in regulation

Page 33: Genome evolution:  a sequence-centric approach

The regulatory process is likely to less deterministicand discrete the this beautiful idealized sea urchinregulatory network

Each regulatory interaction is parameterized and many additional weak interaction participate in the Process

Evolution of regulatory regions involve more than a small set of discrete 20bp sites

Page 34: Genome evolution:  a sequence-centric approach

Chromatin Immunoprecipitation is mapping DNA binding sites

Page 35: Genome evolution:  a sequence-centric approach

Structure meets information: packaging and chromosomal interactions are critical for proper genome function

Page 36: Genome evolution:  a sequence-centric approach

Structure meets information: HOX clusters as an example

Hox genes are important developmental regulators

Present in linear clusters, preserving order

Their expression is frequently coordinate with the gene order

4 HOX clusters are present in the human genome

Additional gene clusters: Protocadherins, Olfactory receptors, MAGE genes, Zinc fingers

Additional smaller groups of related regulators are co-located

Page 37: Genome evolution:  a sequence-centric approach

Mapping chromosomal interactions: 4C

Page 38: Genome evolution:  a sequence-centric approach

Repeats: selfish DNA

ClassCopiesGenome Fraction

LINEs868,000(only ~100 active!!)

20.4%

SINEs1,558,000

(70% Alu)

13.1%

LTR elements443,0008.3%

Transposons294,0002.8%

Repetitive elements in the human genome

Page 39: Genome evolution:  a sequence-centric approach

Retrotransposition via RNA

Page 40: Genome evolution:  a sequence-centric approach

Repeats: short tandems, satellites

DNA-based transposons do not involve an RNA intermediate, and are quite rare.

Satellite DNA duplicate by Replication slippages which is enhanced for specific sequences. Abundant near telomeres and centromeres. Some of these are still a mystery.

Retrotransposition is generally sloppy and noisy – so elements die out quickly

Element proliferation appears in evolutionary bursts.

Page 41: Genome evolution:  a sequence-centric approach

Pseudogenes

Genes that are becoming inactive due to mutations are called pseudogenes

mRNAs that jump back into the genome are called processed pseudogenes (they therefore lack introns)

Page 42: Genome evolution:  a sequence-centric approach

Summary –

• History/Phylogeny:– Early phylogenetics can be inferred using genome sequences, but conclusions are not

always reliable– Maximum likelihood models sometime depends on the gene/genomic region analyzed,

genome is highly heterogeneous at all levels.– The major clades, phylogeny of model organisms and sequenced genomes

• Genome structure– Size and its consequences– Packaging and nuclear organization– Mutational effects and differences– Selfish DNA

• Genome information– Protein coding genes– RNA genes– Transcription factor binding sites– Chromosomal organization and DNA codes that affect it