v4 sequencing reagent experience
DESCRIPTION
Slide Deck from Josh's 2014 presentation at the Illumina user group meeting in RTP. Slides describe our experience with V3 and V4 chemistries on a very large cohort of exome sequenced samples.TRANSCRIPT
![Page 1: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/1.jpg)
V4 Sequencing Reagent Experience
Joshua BridgersDuke University
Center for Human Genome Variation
![Page 2: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/2.jpg)
V3 vs V4 Chemistry
• V3– 100bp x 100bp– 12 days run time– Requires loading of
pair-end reagents– ~300gb per flowcell
• V4– Requires HiSeq 2500
or newer 2000– 125bp x 125bp– 6 day run time*– Pair-end reagents are
loaded at the start of the run
– ~600gb per flowcell
![Page 3: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/3.jpg)
Throughput
≈
2400gb / 12 day 2400gb / 12 day
V4 HiSeq 2000/2500
V3 HiSeq 2000
![Page 4: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/4.jpg)
2nd Generation Sequencing Advances
• V3 System Chemistry– 300GB per flowcell– 12 days to data– Genome: $4700, Exome: $790
• V4 System Chemistry– 600GB per flowcell– 6 days to data– Genome: $3000, Exome: $640
• X System Chemistry– 1GB per patterned flowcell– 3 days to data– Genome: $1500, Exome: $500
![Page 5: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/5.jpg)
Data Quality – Percent Q30 V3
600 700 800 900 1000 1100 1200 13000.65
0.7
0.75
0.8
0.85
0.9
0.95
1
V3 %q30R1
V3 %q30R2
Cluster Density K/mm2
Perc
ent Q
30
![Page 6: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/6.jpg)
Data Quality – Percent Q30 V4
600 700 800 900 1000 1100 1200 13000.65
0.7
0.75
0.8
0.85
0.9
0.95
1
V4 %q30R1
V4 %q30R2
Cluster Density K/mm2
Perc
ent Q
30
![Page 7: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/7.jpg)
Data Quality – Percent Q30
600 700 800 900 1000 1100 1200 13000.65
0.7
0.75
0.8
0.85
0.9
0.95
1
V3 %q30R1
V4 %q30R1
V3 %q30R2
V4 %q30R2
Cluster Density K/mm2
Perc
ent Q
30
![Page 8: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/8.jpg)
Data Quality – Average Quality Score V3
600 700 800 900 1000 1100 1200 130029
30
31
32
33
34
35
36
37
V3 Avg. Qscore R1V3 Avg. Qscore R2
Cluster Density K/mm2
Qua
lity
Scor
e
![Page 9: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/9.jpg)
Data Quality – Average Quality Score V4
600 700 800 900 1000 1100 1200 130029
30
31
32
33
34
35
36
37
V4 Avg. Qscore R1V4 Avg. Qscore R2
Cluster Density K/mm2
Qua
lity
Scor
e
![Page 10: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/10.jpg)
Data Quality – Average Quality Score
600 700 800 900 1000 1100 1200 130029
30
31
32
33
34
35
36
37
V3 Avg. Qscore R1V3 Avg. Qscore R2V4 Avg. Qscore R1V4 Avg. Qscore R2
Cluster Density K/mm2
Qua
lity
Scor
e
![Page 11: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/11.jpg)
Data Volume and Processing
• Run folders– .bcl files are now compressed – V3 Run Folders: ~350GB/flowcell– V4 Run Folders: ~500GB/flowcell
• Fastq generation cluster usage per flowcell– V3: 121.5 minutes, 283gb max memory used– V4: 184.9 minutes, 673gb max memory used
![Page 12: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/12.jpg)
Lane-level Alignment
Indel Re-Alignment
Base QualityRecalibration
Merging & Sorting Alignments
PCR Duplicate Removal
BWA - http://bio-bwa.sourceforge.net/
Bioinformatics Pipeline
![Page 13: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/13.jpg)
Lane-level Alignment
Indel Re-Alignment
Base QualityRecalibration
Merging & Sorting Alignments
PCR Duplicate Removal
SAMtools - http://samtools.sourceforge.net/
Bioinformatics Pipeline
![Page 14: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/14.jpg)
Lane-level Alignment
Indel Re-Alignment
Base QualityRecalibration
Merging & Sorting Alignments
PCR Duplicate Removal
Alignment
Picard MarkDuplicates - http://picard.sourceforge.net/
Bioinformatics Pipeline
![Page 15: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/15.jpg)
Lane-level Alignment
Indel Re-Alignment
Base QualityRecalibration
Merging & Sorting Alignments
PCR Duplicate Removal
GATK - http://www.broadinstitute.org/gatk/
Bioinformatics Pipeline
![Page 16: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/16.jpg)
Core-released Reads
Alignment
Indel Re-Alignment
Base QualityRecalibration
Sorting/Merging Alignments
PCR Duplicate Removal
Analysis-Ready Read Alignments
GATK Unified Genotyper
GATK VQSR
Coverage Depth
Ti/Tv Ratio
dbSNP Overlap
Genotyping & Preliminary QC
Duplicate Read Pct.
Aligned Read Pct.
Gender Check
Bioinformatics Pipeline
![Page 17: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/17.jpg)
Test Sample Description
• Sequenced one trio on V3 and V4 Illumina chemistry
• 400bp size-selected exome capture– V3 sequenced samples have higher overall
coverage
![Page 18: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/18.jpg)
Overall Metrics
• Percent Bases Covered 5x are similar despite coverage difference
• SNV hom/het ratio changed• Indel hom/het ratio changed• dbSNP Overlap, Ti/Tv similar
![Page 19: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/19.jpg)
Variant Call Overlap
V3
36.5kV4
13.6k114.8kV3
34.5kV4
13.9k118.1k
V3
34.5kV4
13.2k114.8k
Sample 1 Sample 2
Sample 3
![Page 20: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/20.jpg)
Variant Call Overlap (Pass/Intermediate,Both 10x Covered)
V3
8.2kV4
7.7k104.1kV3
8.6kV4
7.5k107.3k
V3
7.4kV4
7.7k103.2k
Sample 1 Sample 2
Sample 3
![Page 21: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/21.jpg)
Variant Call Overlap (High Confidence SNV)
V3
141V4
11322553V3
118V490
V3
109V483
Sample 1 Sample 2
Sample 3
22689
22257
![Page 22: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/22.jpg)
Variant Call Overlap (High Confidence SNV)
V3
0.62%V4
0.50%98.9%V3
0.52%V4
0.39%
V3
0.49%V4
0.37%
Sample 1 Sample 2
Sample 3
99.1%
99.1%
![Page 23: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/23.jpg)
Homopolymer Runs
V3
V4
![Page 24: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/24.jpg)
V3
42V42422404
Sample 1
Variant Call Overlap (Low Complexity Regions)
![Page 25: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/25.jpg)
CCDS Coverage
• Analyzed 72 Caucasian unaffected adults for % coverage across a modified CCDS release 14
• Same cohort• 34 V3 samples• 38 V4 samples
• Gender unbiased• All unaffected parents• Overall coverage between 80-90x
![Page 26: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/26.jpg)
CCDS Coverage
V3 Average V4 Average
3x Coverage 97.61% 98.46%
10x Coverage 95.92% 96.71%
20x Coverage 92.93% 92.83%
• Overall greater coverage at 3x and 10x• Similar coverage at 20x
![Page 27: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/27.jpg)
Extended Coverage
V3
V4
![Page 28: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/28.jpg)
Conclusion
• Sequencing throughput increased ~400%– 71% temporary storage space usage– 75% CPU hours for fastq conversion– 120% maximum Vmem usage
• Higher average qscore at higher cluster densities• Higher percent Q30 at higher cluster densities
![Page 29: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/29.jpg)
Conclusion
• High confidence variant calls largely unaffected
• Low complexity regions and indel calls can still be problematic
• Overall increased coverage of CCDS
![Page 30: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/30.jpg)
Questions?
![Page 31: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/31.jpg)
Acknowledgements
• CHGV– Brian Krueger– Slave Petrovski – Linda Hong– Erin Campbell
• Illumina– Adam Jerald– Kenny Patridge
![Page 32: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/32.jpg)
Kaizen
改善
kai
zen
“Good”
“Change”
Cheaper sequencing, extended coverage, lower IT overhead
![Page 33: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/33.jpg)
Data Quality – Percent Q30
V3
• Greater degradation in quality as cycles increase
• Looser distribution
V4 • Small drop in %q30 as cycles
increase• Tighter distribution
![Page 34: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/34.jpg)
Data Quality – Cluster Passed Filter
600 700 800 900 1000 1100 1200 130065
70
75
80
85
90
95
100
V3 Cluster PFV4 Cluster PF
Cluster Density K/mm2
Perc
ent P
ass F
ilter
![Page 35: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/35.jpg)
New SNVHomo and IndelHomo
#SNV Hom #SNV Het SNVHomo Ratio #Indel Hom #Indel Het Indel Homo Ratio
10x coverage
SQC0243F77 12012 118884 0.101039669 2102 18349 0.114556652
SQC0243F77_V4TEST 10181 101963 0.099849946 1833 14415 0.127159209
SQC0243F77 shared 9637 92697 0.103962372 1283 11216 0.114390157
SQC0243F77 missing 404 3345 0.12077728 404 3345 0.12077728
SQC0243F77_V4TEST missing 1036 11552 0.08968144 446 3617 0.123306608
![Page 36: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/36.jpg)
Additional Filters (SNV)
• Percent Alt Read 0.3 – 1• GQ >50• SB < 60• HaplotypeScore < 13• MQ > 40• QD > 2• QUAL > 50• RPRS > -6• MQRS > -6• NON_SYNONYMOUS_CODING | SYNONYMOUS_CODING |
START_GAINED | START_LOST | STOP_LOST | STOP_GAINED |SPLICE_SITE_ACCEPTOR | SPLICE_SITE_DONOR | EXON
![Page 37: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/37.jpg)
Discarded Variants
• Sample 1– 6.9k NON_SYNONYMOUS_CODING | SYNONYMOUS_CODING |
START_GAINED | START_LOST | STOP_LOST | STOP_GAINED |SPLICE_SITE_ACCEPTOR | SPLICE_SITE_DONOR | EXON
– 25.6k 10X coverage for both samples– 32.1k 10X coverage for one sample– 43.0k Percent Alt Read 0.3 – 1– 43.9k QUAL > 50– 46.3k GQ >50– 52.5k HaplotypeScore < 13– 56.8k Passed/Intermediate– 59.1k MQ > 40– 59.6k QD > 2– 67.6k SB < 60– 70.9k MQRS > -6– 71.3k RPRS > -6
![Page 38: V4 Sequencing Reagent Experience](https://reader035.vdocuments.us/reader035/viewer/2022081504/5591ae391a28ab9e348b4643/html5/thumbnails/38.jpg)
Homopolymer Runs
V3
V4