karen miga uc santa cruz - amazon s3...primase, dna, polypeptide 2 (prim2), mrna chr3/chr6...
TRANSCRIPT
![Page 1: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/1.jpg)
Karen Miga
UC Santa Cruz
![Page 2: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/2.jpg)
Assessing variation in the human genome
enables discovery research
“Much of the missing heritability (the „dark matter‟ of the
genome) will probably turn up as the technology
advances.”
- Francis Collins
Nature 464, 674-675 (2010)
![Page 3: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/3.jpg)
CEN
CENTROMERIC REGIONS
Millions of bases of repetitive DNA
The promise of long read sequences to improve
sequence variant discovery
![Page 4: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/4.jpg)
NO LONGER CONSIDERED “JUNK DNA”
Function of Centromeric and Heterochromatic
DNA
CENTROMERE
FUNCTION
![Page 5: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/5.jpg)
NO LONGER CONSIDERED “JUNK DNA”
Function of Centromeric and Heterochromatic
DNA
CENTROMERE
FUNCTION CANCER
![Page 6: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/6.jpg)
NO LONGER CONSIDERED “JUNK DNA”
Function of Centromeric and Heterochromatic
DNA
CENTROMERE
FUNCTION CANCER AGING
![Page 7: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/7.jpg)
PacBio Long Read Sequences to Predict
Satellite Sequence Variants (satVARs)
Quality Corrected
Reads
Automated
Sequence
Characterization
Genome
SatVAR Discovery
CEN
Generate a profile of satellite
variants for a given individual
genome
![Page 8: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/8.jpg)
CEN
~171bp
Tandem Repeat
Wide Range of Percent ID: ~60-100%
ALPHA SATELLITE
1 2 3 4
Alpha Satellite define all normal human centromeres
![Page 9: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/9.jpg)
CEN
~171bp
Tandem Repeat
Wide Range of Percent ID: ~60-100%
ALPHA SATELLITE
1 2 3 4 1 2 3 4 1 2 3 4
Alpha Satellite repeats (or monomers) are commonly
found in long arrays of near-identical higher order
repeats
![Page 10: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/10.jpg)
CEN
~171bp
Tandem Repeat
Wide Range of Percent ID: ~60-100%
ALPHA SATELLITE
1 2 3 4 1 2 3 4 1 2 3 4
“Higher Order Repeat” Multi-monomeric Repeat Unit
Alpha Satellite repeats (or monomers) are commonly
found in long arrays of near-identical higher order
repeats
![Page 11: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/11.jpg)
CEN
~171bp
Tandem Repeat
Wide Range of Percent ID: ~60-100%
ALPHA SATELLITE
1 2 3 4
Satellite DNA are the primary sequence in each gap
1 2 3 4 1 2 3 4
Narrow Range of Percent ID: 94% -
100%
Alpha Satellite repeats (or monomers) are commonly
found in long arrays of near-identical higher order
repeats
![Page 12: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/12.jpg)
CEN
CEN
Array 1
Array 2 Array 3
Each chromosome has a different centromeric
sequences
![Page 13: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/13.jpg)
CEN
Array 1. Individual A
Array 1. Individual B
A
B
~0.5 Mb
~2.0 Mb
Higher-order arrays vary between individuals
![Page 14: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/14.jpg)
Higher-order arrays can vary between
homologous chromosomes in the same individual
CEN
Array 1. maternally inherited
Array 1. paternally inherited
~0.5 Mb
~2.0 Mb
mat
pat
![Page 15: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/15.jpg)
CEN
Model of Centromere Sequence Organization
Array 1
] [ n
8-mer
Array 2
] [ n
4-mer
![Page 16: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/16.jpg)
CEN
Model of Centromere Sequence Organization
Array 1
] [ n
8-mer
Array 2
] [ n
4-mer
DELETION
(6-mer)
INSERTION
(12-mer)
Rearrangements in repeat structure
![Page 17: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/17.jpg)
CEN
Model of Centromere Sequence Organization
Array 1
] [ n
8-mer
Array 2
] [ n
4-mer
DELETION
(6-mer)
INSERTION
(12-mer)
Rearrangements in repeat structure Shifts in repeat orientation
?
![Page 18: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/18.jpg)
DELETION
(6-mer)
INSERTION
(12-mer)
Rearrangements in repeat structure Shifts in repeat orientation
?
Sites of Interspersed Repeats
LINE
Junction with seemingly unique DNA
Transcribed Genes
![Page 19: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/19.jpg)
DELETION
(6-mer)
INSERTION
(12-mer)
Rearrangements in repeat structure Shifts in repeat orientation
?
Sites of Interspersed Repeats
LINE
Junction with seemingly unique DNA
Transcribed Genes
Implement a strategy to characterize satellite sequence variants with
long-read sequences
![Page 20: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/20.jpg)
Implement a strategy to characterize satellite sequence variants with
long-read sequences
github.com/volkansevim/alpha-CENTAURI
ALPHA satellite CENTromeric AUtomated Repeat Identification
• Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, long reads allow direct inference of satellite higher order repeat structure.
87606 Error Corrected
pReads
Human Centromeric DNA
Variants:
Alpha Satellite
CHM1 GENOME
![Page 21: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/21.jpg)
3‟ 5‟
1,9S1
2.5 kb quality corrected PacBio read
![Page 22: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/22.jpg)
3‟ 5‟
1. Identifies clusters of monomers with high sequence
similarity (FALCON error correction module)
98% Identical
# bases # bases
![Page 23: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/23.jpg)
3‟ 5‟
1. Identifies clusters of monomers with high sequence
similarity (FALCON error correction module)
98% Identical
# bases # bases
2. Cluster similarity threshold per read by evaluating a range of
identity values (98% to 88%, by 1% decrements)
![Page 24: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/24.jpg)
3‟ 5‟
1. Identifies clusters of monomers with high sequence
similarity (FALCON error correction module)
3. Evaluates the spacing between monomers involved in each
cluster group
98% Identical
# bases # bases
2. Cluster similarity threshold per read by evaluating a range of
identity values (98% to 88%, by 1% decrements)
![Page 25: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/25.jpg)
3‟ 5‟
“Regular” Repeat Structure
![Page 26: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/26.jpg)
3‟ 5‟
5‟
3‟
![Page 27: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/27.jpg)
3‟ 5‟
5‟
3‟
![Page 28: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/28.jpg)
![Page 29: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/29.jpg)
D11Z1
5-mer
CEN
chr11
1680
PacBio
preads
REGULAR
![Page 30: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/30.jpg)
IRREGULAR
CEN
chr11
6-mer (89.9%, 391 preads)
1
2
4
5
4-mer (1.4%, 39 preads)
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 2
15,559 bp pread
3 4-mer
(1.6%, 43 preads)
4-mer (1.4%, 40 preads)
INSERTION
(6-mer)
![Page 31: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/31.jpg)
CEN
chr11
INVERSION
6-mer 2
4
5
Inversion: 1 event, junction: 236 bp
4-mer 4-mer
4-mer
1
3
In total, ~5% (4493/87606)
of all alpha satellite reads
provide evidence for
an inversion
![Page 32: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/32.jpg)
TRACKING SITES OF INTERSPERSED REPEATS
LINE/L1 L1Hs (2384 bp) LINE/L1 L1P3 (1358 bp)
96% recent LINEs
L1Hs, LIP1, L1PA2-4
![Page 33: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/33.jpg)
chr3 CEN
3918 preads that contain both alpha satellite and at
least 10 kb of non-alpha satellite sequence
Identify Junctions with seemingly unique DNA
Primase, DNA, polypeptide 2 (prim2),
mRNA
chr3/chr6 Paralogous (non-sat) Region
~300 kb
CHM1: LJ1101000307.1
![Page 34: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/34.jpg)
Full Coverage
(~60x)
Low Coverage
(10x)
![Page 35: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/35.jpg)
Full Coverage
(~60x)
Low Coverage
(10x)
![Page 36: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/36.jpg)
“Unmapped”
Database
![Page 37: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/37.jpg)
PacBio Long Read Sequences to Predict
Satellite Sequence Variants (satVARs)
CHM1
Genome
SatVAR Discovery
CEN
Profile of satellite DNA variants
CHM1, CHM13
TRIO data sets
(CEPH and GIAB)
![Page 38: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/38.jpg)
Acknowledgements
Volkan Sevim
Jason Chin
Ali Bashir
github.com/volkansevim/alpha-CENTAURI
![Page 39: Karen Miga UC Santa Cruz - Amazon S3...Primase, DNA, polypeptide 2 (prim2), mRNA chr3/chr6 Paralogous (non-sat) Region ~300 kb CHM1: LJ1101000307.1 Full Coverage (~60x) Low Coverage](https://reader035.vdocuments.us/reader035/viewer/2022081614/5fc945c10955b8159275fcf2/html5/thumbnails/39.jpg)
1000 Genome Sequence Data
(400 male individuals)