agbt 2016 workshop magrini
TRANSCRIPT
MGI Reference Genomes Workshop
Vince MagriniFebruary 10th 2016
Sequencing Plan
• PacBio Large Insert Library Construction• Linked Reads with 10X Genomics• Physical Map contiguity using BioNano IRYS
Pacific Biosciences
The NA19240 Large Insert Library Experience
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 7610,000
10,500
11,000
11,500
12,000
12,500
13,000
13,500
14,000
14,500
15,000
ROI Length
Lib4-ROI length Lib2-ROI length Lib3-ROI length Lib5-ROI lengthLib6-ROI length Lib7-ROI length LibF-ROI length Lib8-ROI length
SMRT Cell
ROI L
engt
h (b
p)
Considerations for PacBio WGS
• High molecular weight genomic DNA• DNA must be of sufficient quality to allow for >30 kb
shearing to produce PacBio Continuous Long Reads (CLR)• Consistent shearing >30 kb
• Shearing genomic DNA >30 kb is challenging and requires a consistent technology
• Preferred method: Diagenode Megaruptor• Alternate method: Covaris g-Tube
• Sufficient DNA for PacBio sample prep• A single PacBio sample prep reaction requires 5 μg sheared
DNA• One library is composed of 8-10 sample prep reactions• At least 2-4 libraries are required for 60x coverage
NA19240 Sheared DNA Comparison
Library Shear Type Shear Settings2 g-Tube 5500 rpm3 g-Tube 4800 rpm4 g-Tube 4800 rpm5 g-Tube 4500 rpm
6 MegaRuptor Menlo Park 30 kb7 MegaRuptor Menlo Park 30 kb8 MegaRuptor MGI 30 kb
30kb MGI 30kb MP
G-Tube 4800 G-Tube 4500
✜ ✪
PacBio Workflow
DNA Shear
DNA Repair
Ligation/Exonuclease
BluePippin >18kb Sizing
DNA Repair
AMPure PB
AMPure PB
3x AMPure PB
Rinse wells
AMPure PB
AMPure PB
Seq. Primer Anneal
P6 Polymerase Bind
MagBead Bind
Sequencing
30 minutes or 4 hours
20 minutes to 2 hours
Denature primer prior to use
4 to 6 hour collection time
• Adding DNA Damage Repair after BluePippin sizing increased the average Reads of Insert length by ~1 kb.
• Extending the P6 Polymerase Binding time from 30 minutes to 4 hours improved library complex loading per SMRT cell
Standard PacBio protocol (sample prep & complex)
Titration
• No Post-BluePippin DNA Damage Repair
• 30 min P6 polymerase bind
6 hourMovies
4 hourMovies
125 pM “on plate” loading concentration
G-Tube 4800✜
DNA Damage Repair & extended P6 bind
• No Post-BluePippin DNA Damage Repair
• 30 min P6 polymerase bind
• Post-BluePippin DNA Damage Repair
• 4 hour P6 polymerase bind
G-Tube 4500✪
Menlo Park 30 kb MegaRuptor
Titration
4 hr P6 bind8Pac lot #231848
30 min P6
bind8Pac lot #231848
4 hr P6 bind8Pac lot #
231818
4 hr P6 bind8Pac lot #
231848 4 hr P6 bind8Pac lot #
231818
• Post-BluePippin DNA Damage Repair
• 4 hour or 30 minute P6 polymerase bind
30kb MP
MGI 30 kb MegaRuptor
Titration 125 pM “on plate” loading concentrationClear cell-to-cell variability
Failed
cell
30kb MGI
PacBio NA19240 Sequencing Statistics
Sample 8 Packs Reads Mbp (Pol) RL Mbp(ROI) RL ROI Mbp/CellNA19240 37 16,088,050 214,621 13,605 195,619 12,487 661HG00733 30 15,858,313 209,619 13,193 190,430 11,958 793HG00514 40 20,707,629 311,500 13,473 277,690 13,473 868
NA12878* 22 11,029,811 165,153 14,949 146,833 13,174 962
Assembly Stats will be highlighted in Tina’s presentation.
PacBio Sequencing ObservationsHG00514: 4h v 6h movie lengths
Instrument Movie Time Avg. ROI (bp) ROI Mb/Cell # Cells
00116 240 13,502 803 119
42274 240 13,036 881 95
00116 360 14,324 998 56
42274 360 13,282 1,063 24
• DNA Input and Sizing• The library DNA >18 kb is fractionated using the Sage Science BluePippin• DNA Damage Repair enzyme mix used post BluePippin (increased read
length)• Chemistry
• (+) DNA Damage Repair/4 hr bind: 970.2 Mbases/cell• Instruments
• Longer average Reads • Increased Loading Efficiency
• What about long term storage?
10X Genomics
Reconfigured Oligo- Uses inline index sequence- No P5 index – HiSeq X single index compatible
10X Genomics Overview
10X Chromium Workflow
• HiSeq 4000• 2x150, 200 pmol
loading• 2 lanes
Chromium NA19240 Library Sequencing Statistics
Post Gem: Isothermal Amp size dist.
Library Size Distribution
The spike at 0 in that graph is due to the N's in the reference assembly.
NA19240 (MGI)NA12878 (10X)
Molecule Length (kb): 26,768 (±33,673) 94,923 (±64,103)DNA in Molecules > 10kb 50.85 %
95.0%DNA in Molecules > 100kb 1.38% 36.4%SNPs Phased: 99.1%
97.8%Longest Phase Block: 9.6 Mbp
34.7 MbpN50 Phase Block: 1.9 Mbp
9.5 Mbp
Chromium Molecule and Phasing Statistics
BioNano
Harvest Cells
Dissociate Tissue
Embed Cells in Gel
Plugs
Lyse Cells,Digest Protein
Melt and Digest
Agarose Plugs
Sample Cleanup
Labeling Reaction
BioNano Overview
10-500kb 100-500kb 150-500kb 200-500kb 250-500kb >500kb0%
10%
20%
30%
40%
50%
60%
70%
80%
90%100%
192401923919238
19240 19239 19238Mapped Molecule Quantity (Mb) 189,138.79 256,281.33 226,854.88Mapped Avg Size (Kb) 232 280 289Avg Label Density (per 100 Kb) 9.6 8.7 8.8Number of Consensus Genome Maps 3051 2565 2798Consensus Genome Maps Size (Mb) 2833.045 2965.972 2933.294Consensus Genome Maps N50 (Mb) 1.276 1.685 1.477Avg Depth of Mol Coverage 59.1 56.1 50.6
BioNano: Yoruban Trio Statistics
Molecule Length Bin
Mol
ecul
es/B
in (%
)
PacBio Assembly Contig
BioNano Genome Map Contigs
Sequencing Plan
Add 10X Linked Read information
Add Dovetail Hi-C/chiCago Data
Summary• Goal: Generate robust data sets for additional high-
quality reference genome enhancing the full range of genetic diversity in humans
• These long read (long range) sequencing/mapping applications vary in approach and will provide synergistic data sets to help accomplish our goal.
• Each system possesses unique challenges and requires optimization of protocols and running conditions specific to our needs.
• Experience and communication is key. • Increasing applications and utility
• Polymerase read = read of insert• BAC Pooling• Low input SNV • Multicolor labeling
AcknowledgementsThe McDonnell Genome Institute at Washington University in St. Louis
Rick Wilson
Sean McGrathAmy LyRyan Demeter
Dave LarsonKaryn Meltz SteinbergTina GravesBob Fulton
Derek AlbrachtMilinn KremitzkiSusan RockDebbie Scheer
Wes WarrenChad Tomlinson
10X GenomicsCassandra JabaraMichael Schnall-LevinDrew KebbelRob TarboxDeanna Church
BioNano GenomicsAndrew AnforaPalak ShethAlex Hastie
Pacific BiosciencesPaul PelusoNick Sisneros