repetitive and duplicitous structure of genomes
DESCRIPTION
Repetitive and Duplicitous Structure of Genomes. Jeff Bailey S5-432. Human Genome Structure. Hetrochromatic Sequence (tandem satellite repeats) Centromeric alpha-satellite, telomere CAGGG, acrocentric rRNA and beta-satellite Euchromatic sequence ~3.1 gigabases Genes (35%) ~25,000 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/1.jpg)
Repetitive and Duplicitous Structure of Genomes
Jeff BaileyS5-432
![Page 2: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/2.jpg)
Human Genome Structure
Hetrochromatic Sequence (tandem satellite repeats) Centromeric alpha-satellite, telomere CAGGG, acrocentric rRNA and beta-
satellite Euchromatic sequence ~3.1 gigabases
Genes (35%) ~25,000 Exons (1%) (transcription more ubiquitous ENCODE) Repetitive Sequences
3% Simple Sequence Repeats (poly A runs, dinucleotide and trinucleotide repeats)
45% Interspersed Repetitive Elements Repetitive Element Size Copies Fraction LINE elements (retrotransposon) up to 8 kb 850,000 21% Alu elements (retrotransposon) 300 bp 1,500,000 13% LTR-retrovirus-like 6-11 kb 450,000 8% DNA transposons 1-3 kb 300,000 3%
(International Human Genome Sequencing Consortium. Science 2001
Vast majority of sequence is non-coding and repetitive.Vast majority of sequence is non-coding and repetitive.
![Page 3: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/3.jpg)
Human Satellites
![Page 4: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/4.jpg)
Centromeric Sequence Human:
171 bp alpha-satellite in array of 2-5 Mb higher order structure (only in Great Apes) 4-20
4-30 k-mer (A-B-C-D-A-B-C-D-A-B-C-D) A-B-C-D to A-B-C-D (2-5%) A-D- 20-40% Further flanked by other satellites (beta satellite)
Mouse:
234 bp major satellite (6 Mb) an 120 bp (600 kb) minor satellite at centromeric constriction Arabibdopsis
178 bp satellite in 3 Mb array
Drosophilia:
5 bp simple arrays of AATAT and AAGAG C. elegans:
Holocentric – entire chromosome acts as centromere Yeast:
CEN3 1-2 kb of 83 bp repeat
![Page 5: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/5.jpg)
Simple sequence repeats (SSRs) ATGATGATGATG
• SSR: perfect or slightly imperfect tandem repeats of a particular k-mer• About 3% of the human genome (~0.5% by dinucleotide)• Derived from slippage during DNA replication
Microsatellites: n=1-13 basesMinisatellites: n=14-500 bases
Repeat unit Number of SSRs per Mb
![Page 6: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/6.jpg)
Interspersed Repeats
DNA transposons “extinct” in primate lineage (~40 mya). Quiescent in mammalian lineages.
![Page 7: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/7.jpg)
Genome Variability
![Page 8: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/8.jpg)
Annu Rev Genet. 2007; 41: 331–368.
Sc: Saccharomyces cerevisiae; Sp: Schizosaccharomyces pombe; Hs: Homo sapiens; Mm: Mus musculus; Os: Oryza sativa; Ce: Caenorhabditis elegans; Dm: Drosophila melanogaster; Ag: Anopheles gambiae, malaria mosquito; Aa: Aedes aegypti, yellow fever mosquito; Eh: Entamoeba histolytica; Ei: Entamoeba invadens; Tv: Trichomonas vaginalis.
Variation in Relative Content
![Page 9: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/9.jpg)
DNA Transposons
Copy / pastel
![Page 10: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/10.jpg)
Human Retrotransposons Serial evolution of master
elements L1: 80-100 active L1s (6 hot L1-
Ta) Alu 143 active elements Alu Yb (puncuated)
– 2000 copies; only handufl in other primates.
SVA (~25 mya)
– pol II, 3000 copies New integration: L1 and Alu ~ 1
in 20 meioses; SVA 1 in 90
Pol II
Pol III
Pol III
![Page 11: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/11.jpg)
L1 “master” elements
![Page 12: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/12.jpg)
Mouse vs. Human
MGSC Nature, Volume 420, Issue 6915, pp. 520-562 (2002).
![Page 13: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/13.jpg)
Biological Impact of Retrotransposons
Cordaux and batzer Nature Reviews Genetics 10, 691-703 (October 2009)
![Page 14: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/14.jpg)
Biological Importance (cont.)
Boundary / Insulator Elements Alternative splicing / novel
exons / novel genes Role in suppression of poly II
transcription in cellular stress What accounts for long-
term maintenance?
![Page 15: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/15.jpg)
Human Genome Structure
Hetrochromatic Sequence (tandem satellite repeats) Centromeric alpha-satellite, telomere CAGGG, acrocentric rRNA and beta-
satellite Euchromatic sequence ~3.1 gigabases
Genes (35%) ~25,000 Exons (1%) (transcription more ubiquitous ENCODE) Repetitive Sequences
3% Simple Sequence Repeats (poly A runs, dinucleotide and trinucleotide repeats)
45% Interspersed Repetitive Elements Repetitive Element Size Copies Fraction LINE elements (retrotransposon) up to 8 kb 850,000 21% Alu elements (retrotransposon) 300 bp 1,500,000 13% LTR-retrovirus-like 6-11 kb 450,000 8% DNA transposons 1-3 kb 300,000 3%
(International Human Genome Sequencing Consortium. Science 2001
Vast majority of sequence is non-coding and repetitive.Vast majority of sequence is non-coding and repetitive.
![Page 16: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/16.jpg)
• Whole Genome Duplication
– Ancient 4N 2N• Segmental Duplications
– Tandem– Interspersed
• Interchromosomal• intrachromosomal
Types of Duplications
![Page 17: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/17.jpg)
Susumu Ohno
• Whole Genome Duplication
• Vertebrate Paradigm: ancient whole genome duplications and recent tandem duplications– (review: Panopoulou (2005) TIG 10:560)
• KEY CONCEPT: New genes usually derived from copies
2n 4n rearrangement 2n
![Page 18: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/18.jpg)
Paralogy--two genes/proteins in the same species which share sequence similarity due to duplication.
2b. Orthology--two genes/proteins in different species which share sequence similarity and are descended from a common ancestor.
3. Xenology--introduction of a new sequence into the genome by horizontal transfer between two species
![Page 19: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/19.jpg)
Segmental Duplication (SD)
Segmental Duplications
Repetitive Element Exon
Time (100s mya)
Key raw material for the evolution of novel genes
Time (1-50 mya)
`
![Page 20: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/20.jpg)
Segmental Duplications (SD)
Bailey and Eichler (2006) Nat Rev Genet
Properties:•Clustered•Complex regions•Dynamic regions
99.1% identical over 180 kb (VCF/DiGeorge Syndrome in 1 in 3000 births)
5.4% of the genome (>90% identity and >1 kb)chr22
![Page 21: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/21.jpg)
SDs Underlie Recurrent Germline Deletions and Duplications
Cen TelID D’
Cen I D’D
Tel
Tel
Cen
Cen
GAMETES
D D’I I
Change in Dosage Sensitive Genes → phenotype or disease
Dynamic Regions – predisposed to further rearrangements
Non-allelic Homologous Recombination (Lupski, 1999)
D’- D
D - D’
![Page 22: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/22.jpg)
Figure 1identify high-copy repeats
splice out
Analyze alignments (>1 KB; >90% identity)
blast comparisons--allowing for large gaps
reinsert repeats
heuristic end trimming
global alignments
Detection of Segmental Duplications:Whole genome assembly comparison
Human Draft: Regions of SD poorly assembled (collapsed) and many unique regions with unmerged overlaps (allelic) (Bailey et al. Genome Res 2001)
![Page 23: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/23.jpg)
Genome Wide DetectionAssembly % finished 90-98% >98%July 2000 20% 3.6% 12.9%
January 2001 23% 3.6% 10.6%August 2001 44% 4.1% 15.3%
Problem: Allelic/True Overlap
vs. Duplication
![Page 24: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/24.jpg)
Shotgun Sequence: assembly-independentdetection of high-identity SD
Whole Genome Shotgun Sequence: random sample
Bailey et al. Science 2002
Combined with whole-genome assembly comparison:5.4% of the human genome composed of SDs >1 kb and >90% identity
99.8%False Positive SD Absent SD
(collapsed or missing)
Examine All Public Sequence
Publicsequence
Align Reads: >96% identity
Celera(27.1 M reads)
![Page 25: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/25.jpg)
REPEATS
47
100
200
# Reads / 5
kb
Public
Celera
223
Xq28 donor
![Page 26: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/26.jpg)
Celera Read Depth Across Chr. 22
![Page 27: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/27.jpg)
CoverageN
umber of Reads/5kb
window
Diploid Copy # of Duplication
Depth of Coverage vs. Copy Number
R2=0.96
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 10 20 30 40 50 60
![Page 28: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/28.jpg)
Global Alignments filtered with SDD
5.7%
3.2%
3.2%
3.4%
2.8%
3.4%
7.8%
3.0%
8.2%
5.7% 4.4% 3.3%
3.4% 2.1%
8.2%
9.8% 8.5%
3.1%
8.1%
2.1%
5.2%
10.9%
5.5%
8.8%
40.7%
0%
5%
10%
15%
20%
25%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
Chromosome
INITIALFILTERED
68.6.%
0%
5%
10%
15%
20%
25%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
Duplicated B
ases (% Total C
hromosom
e)
INITIALFILTERED
![Page 29: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/29.jpg)
•130 candidate regions (298 Mb) •23 associated with genetic disease
SD “Hotspot”Map of Human
Genome
Bailey et al. Science 2002
Interrogation of these regions has lead to detection of 16 additional pathogenic rearrangements including new microdeletions on 1q21.1, 15q13, 15q24 and 17q12. (Sharp et al. Nat Genet 2006; Mefford et al. Am J Hum Genet 2007; Mefford et al. N Engl J Med 2008)
![Page 30: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/30.jpg)
Genetic Distance Finished Sequence
Sept 2000 NT data set(>2KB; >90%; no X—Y)
0200400600800
1000120014001600
0.010.020.030.040.050.060.070.080.090.10
0100200300400500600700800900
1000
0.010.020.030.040.050.060.070.080.090.10
Tota
l Al
igne
d ba
ses
(kbp
)
Genetic distance (K)
Intrachromosomal Interchromosomal
![Page 31: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/31.jpg)
Species SDs
Marques-bonet et al. TIG 2009
Duplicated Bases FLY WORM Chrom 22> 1 KB 1.20% 4.25% 9.50%> 5 KB 0.37% 1.50% 7.90%>10 KB 0.08% 0.66% 6.40%
![Page 32: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/32.jpg)
Duplicated Genes
Johnson et al 2001 Nature
Gene Enrichments Immunological Environmental
response Reproduction:
sperm-egg interactions
![Page 33: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/33.jpg)
Morpheus
![Page 34: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/34.jpg)
Duplicon Structure Chr 22
![Page 35: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/35.jpg)
Organizing the MESS
Jiang et al. 2007 Nat Gen:39:1361-8
![Page 36: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/36.jpg)
437 Hubs
Jiang et al. 2007 Nat Gen:39:1361-8
![Page 37: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/37.jpg)
Mechanism: Junction Content
Control +/- 1 kb
Junction (50 bp)
•Duplications >95% and < 99.5%•Only finished sequence•Enrichment for Alu elements
![Page 38: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/38.jpg)
Alu Proximity to Junctions
5%
15%
25%
-500 -400 -300 -200 -100 0 100 200 300 400 500
10 bp window
DUPLICATED UNIQUE
Center of Window (bp from Junction)
Average A
lu Content
(bp)
![Page 39: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/39.jpg)
Alu Simulation
0
50
100
150
200
250
300
350
0 5 10 15 20 25
Proportion Alu (%)
Num
ber of replicates
23.8%
Computer simulations to determine significance.
![Page 40: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/40.jpg)
Subfamily Enrichment
20,000
40,000
60,000
80,000
100,000AluYAluSAluJ
20
humanchimp
orangutanOld World
New WorldProsimian
Mammal
gorillaAluJAluSAluY
40 60 80 mya
≥90% 1.8 1.9 1.1≥95% 2.2 1.8 1.1
0
Num
ber of Elements
![Page 41: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/41.jpg)
Whole Genome Duplication
![Page 42: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/42.jpg)
Whole Genome Duplication Yeast
Kellis and Lander (Nature 428:617-24 2004)
![Page 43: Repetitive and Duplicitous Structure of Genomes](https://reader036.vdocuments.us/reader036/viewer/2022070502/56814b59550346895db85189/html5/thumbnails/43.jpg)
Explore Resources
REMINDER OF CLASSExercises for analysis of repetitive elements and segmental duplications