Discovery of Structural Variation with Next-Generation Sequencing
Alexandre [email protected]
Gilles Fischer Team – Biology of Genomes UMR7238Laboratory of Computational and Quantitative Biology
Université Pierre et Marie-Curie, Paris
(i) Structural variations (SV)(ii) SV detection technologies (iii) Read pairs: 2 types of Illumina genomic DNA libraries(iv) SV detection using Read pairs(v) Polymorphic SV Structural Variations (SV)
outline
INVERSION (INV) RECIPROCAL TRANSLOCATION (RT)
INSERTION (INS)DELETION (DEL)
ref
SV
ref
SV
Bala
nced
SV
Unb
alan
ced
SV (C
NV)
Intrachromosomal SV Interchromosomal SV
ref
SV
ref
SV
TANDEM DUPLICATION (DUP)
Balanced SV versus Unbalanced SV
Pictures adapted from Feuk et al., 2006 Nature ReviewsCalvin Blackman Bridges, Science
Why Discover SV ?
involved in > 30 diseases (Psoriasis, Crohn disease, ASD…)
chromosomal instability detected in the vast majority of cancers
powerful mechanism of adaptation and evolution
Calvin Blackman Bridges, Science
Timeline of technologies used to discover SVSV, Structural Variations since 1936
1936
Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci 1959
Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet 1986
Comparative cytogenetics
Calvin Blackman Bridges, Science
200 et 221 CNV
360 Mb CNVR (12% du génome humain)
1936
Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci 1959
Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet 1986
Iafrate, Detection of large-scale variation in the human genome, Nature
Sebat, Large-scale copy number polymorphism in the human genome, Science
2004
Redon, Global variation in copy number in the human genome, Nature 2006
Comparative cytogenetics
Microarrays
Timeline of technologies used to discover SVSV, Structural Variations since 1936
Calvin Blackman Bridges, Science
200 et 221 CNV
360 Mb CNVR (12% du génome humain)
Microarrays
Korbel et al, Paired-end mapping reveals extensive structural variation in the human genome, Science
NGS
1936
Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci 1959
Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet 1986
Iafrate, Detection of large-scale variation in the human genome, Nature
Sebat, Large-scale copy number polymorphism in the human genome, Science
2004
Redon, Global variation in copy number in the human genome, Nature 2006
2007
1000 HGP, A map of human genome variation from population-scale sequencing, Nature 201020 000 SV
1 000 SV
Comparative cytogenetics
Timeline of technologies used to discover SVSV, Structural Variations since 1936
Breakpoints res.
SV size range
CNV
Balanced SV
FDR
Missing rate
>100 bp
> Insert Size
Yes
Yes
Variable
Variable
Quinlan & Hall 2011 Trends in Genetics
LI 2011 Nature
1 bp
1 bp–50 kbp
Yes
Yes
>10%
>25%
1-10 bp
>10 bp
Yes
No
High?
High?
1 bp
>1 bp
Yes
Yes
low
High?
How to detect SV with NGS data ?
Illumina Paired end vs Mate-Pair
(MP allows a better genome assembly than PE)MP allows to detect SV that involve repeated elements
Illumina Paired end vs Mate-Pair
Insert-size distribution of 100,000 read-pairs
Insert-size (bp)5,000
(or much less…)
SV detection with Read pairs
1) trim the data2) align data to reference genome3) remove PCR duplicates4) SV calling
Trim the dataSecond criteria: nucleotide quality
Bcbio-nextgenBtrimCANGSChipster
Clean readsConDeTriEa-utilsFastx
FlexbarPRINSEQReaperSeqTrim
SkewerSolexaQATagCleanerTrimmomatic
Trimming tools
Remove PCR duplicates
samtools rmdup (only intra-molecular duplicates)markduplicates.jar (picard tools)FastUniq…
PCR duplicates annotation tools
Not-so-identical monozygotic twins
Bruder, C. E. G. et al. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–771 (2008)
Individual (germ line)SV in 100% of cells of each individual
Tissue (somatic)SV in one tissue / in a few cells
Polymorphic SV Structural Variations (SV)
#generation
Bottleneck 1
60 90 120 150 2400
Bottleneck 2 Bottleneck 3 Bottleneck 4 Bottleneck 5 Bottleneck 80
0 30
#cells 124 109
Sequencing a single culture
Can we detect de novo SV occurring in a single cell culture by high throughput sequencing ?
DNA extractionSequencing
(n=80)DNA extractionSequencing
The physical coverage (theoretically) sets the detection threshold
S. cerevisiae
30 # generations 0 11
109# cells 112
24
122.103103
138.103
141.6.104
6,000X700X
Pair-End sequencing: insert size ~ 400 bp
Sequencing with high physical coverage
Reference
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell 9
Cell 10
Pair-End sequencing: insert size ~ 400 bp
Sequencing with high physical coverage
Reference
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell 9
Cell 10
Pair-End sequencing: insert size ~ 400 bp
Sequencing with high physical coverage
2
10
Coverage (sequence)
covseq = 0.5X
Reference
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell 9
Cell 10
Pair-End sequencing: insert size ~ 400 bp
Sequencing with high physical coverage
2
10
2
10
Coverage (sequence)
covseq = 0.5X
covphys = 0.85XCoverage (physical)
Reference
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell 9
Cell 10
Pair-End sequencing: insert size ~ 400 bp
Sequencing with high physical coverage
2
10
2
10
Coverage (sequence)
covseq = 0.5XcovSV = 0
covSV = 0
Reference
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell 9
Cell 10
covphys = 0.85XCoverage (physical)
Mate Pair sequencing: insert size ~ 1 to 20 kb
Sequencing with high physical coverage
Reference
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell 9
Cell 10
Discordant Paired Sequence
Mate Pair sequencing: insert size ~ 1 to 20 kb
Sequencing with high physical coverage
Reference
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell 9
Cell 10
2
10
20
46810
covseq = 0.5X
covphys = 5X
Coverage (sequence)
Coverage (physical)
covSV = 1
Discordant Paired Sequence
Mate Pair sequencing increases the sensitivity of SV detection