speeding up the ion s5 xl sequencing system: … are the property of thermo fisher scientific and...

1
Figure 4. Reducing flow number to 200 does not reduce total number of reads or accuracy of species ID and 16S detection. We achieved ~100% specificity with species ID targets at180 flows. However, 16S identification specificity drops at 180 flows. From the results above, we concluded that 200 is the minimum number of flows without compromising detection accuracy. INTRODUCTION The Ion S5™ XL Sequencer currently sequences 200 bp in 2.5 hrs with about 1 hr analysis time. Here we focused on optimization of the sequencing and data analysis workflow in an effort to reduce total turn around time. Since it largely depends on and the number of reagent flows (one flow produces ~0.5 base) and the speed of the flows, we improved the flow speed and utilized the minimum number of flows optimized for the length of the amplicons. We used an existing streamlined library preparation workflow based on Ion AmpliSeq™, clonal amplification (templating) with Isothermal Amplification, and the new rapid Ion S5™ XL sequencing protocol. With the new fast flow protocol on the Ion S5™ XL Sequencer, we achieved sequencing and Torrent Suite analysis in as low as 55 minutes (with 200 flows). The new rapid approach was applied to two different application types (and library preparation protocols) that have the potential to benefit from rapid sequencing turn- around times. 1. Based on the Ion AmpliSeq™ highly-multiplexed PCR approach, a rapid infectious disease identification and antibiotic susceptibility panel designed to sequence 16S and strain and antimicrobial gene-specific regions. 2. A low coverage whole genome sequencing approach for preimplantation aneuploidy screening, the Ion ReproSeq™ PGS kit enables screening of aneuploidy in all 24 chromosomes to help improve implantation success rates. MATERIALS AND METHODS Bacterial identification with Ion AmpliSeq™ assay: Total nucleic acids from six bacterial cultures (Acinetobacter baumannii, Enterobacter cloacae, Enterococcus faecium, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus) were extracted as input for targeted sequencing. Ion AmpliSeq™ libraries were constructed and templated onto Ion Sphere Particles using isothermal amplification reactions. To assess the accuracy of detection with fewer flows, we conducted the runs on the Ion S5™ XL using 500 flows, then simulated runs with fewer flows through reanalysis with the Analysis option: --flow-limit. We counted total number of reads, number of reads assigned to each target through the species ID amplicons, and number of reads assigned to each target through 16S amplicons. Analysis was performed with a custom TS plugin. Ion ReproSeqfor single cell aneuploidy: Libraries were generated from single cell samples using SingleSeq™ whole genome amplification (WGA). Eight libraries were selected to represent each of three cell lines with known copy number variations (CNVs) comprising regions of 40-50 Mbp, similar in size to aneuploidies for the smallest chromosomes. The barcoded libraries were pooled, templated using isothermal amplification, and sequenced for 250 flows on an Ion S5™ XL with the fast run mode. Data were uploaded and analyzed on an Ion Reporter™ Server with duplicate reads removed. CONCLUSIONS With the Ion S5™ XL Sequencer and modifications to the on-instrument flow times and total number of flows, we can reduce the time for sequencing and accurate data analysis for applications requiring rapid turn around time. For AmpliSeq and ReproSeq assays shown, sequencing and Torrent Suite analysis could be completed in 55-75 minutes. Paired with fast library and template preparation, the total turn around time was at or below a standard workday. ACKNOWLEDGEMENTS Bacterial ID AmpliSeq™ assay: Kunal Banjara, Jamsheed Ghadiri, Andrew Hutchison, Peter Vander Horn, Karen Clyde, Nisha Mulakken, Rajesh Gottimukkala, Diana Jeon, and Simon Cawley Ion ReproSeq™ assay: Mark Andersen and Rob Bennett. Torrent Suite Software: Dominique Belhachemi, Christian Koller, and Mohit Gupta For Research Use Only. Not for use in diagnostic procedures © 2016 Thermo Fisher Scientific, Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. Rongsu Qi 1 , Chaitali Parikh 1 , David Mandelman 2 , Haythem Latif 2 , Adam Harris 2 , and Srinka Ghosh 1 1 Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA 94080, USA 2 Thermo Fisher Scientific, 5791 Van Allen Way, Carlsbad, CA 92008, USA Speeding up the Ion S5™ XL sequencing system: Sequencing in an hour enables sample to answer in a 8 hr workday As the majority of the amplicons for the AmpliSeq panel have a mean insert length of 95 bp; we estimated the minimum number of flows needed as 200. Analysis of a S5 XL run (with 500 flows) at 300, 250, 200 and 180 flows showed that at 200 flows, the read length distribution is comparable to 500 flows. At 180 flows, read length distribution is narrowed and reads are shortened (Figure 3). Figure 3. Observed read length distribution at 200 flows agrees with 500 flows and the expected amplicon insert length distribution. 180 flows 200 flows 250 flows 250 flows 300 flows 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 E. clo S. aur K. pne A. bau P.aer E. fae #Flows Sample Total number of reads % Reads mapped to different species ID targets Acinetobacter baumannii Enterobacter cloacae Enterococcus faecium Klebsiella pneumoniae Pseudomonas aeruginosa Staphylococcus aureus % Reads mapped to different 16S families Bacillaceae Enterobacteriaceae Enterococcaceae Lactobacillaceae Moraxellaceae Planococcaceae Pseudomonadaceae Staphylococcaceae Vibrionaceae 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 E. clo S. aur K. pne A. bau P.aer E. fae #Flows Sample Assuming amplicons are at equal concentration Length Frequency 500 flows 300 flows 250 flows 200 flows 180 flows Expected 95 bp Mean Fast S5 XL with Bacterial ID AmpliSeq TM results 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 180 200 250 300 500 E. clo S. aur K. pne A. bau P.aer E. fae #Flows Sample Fast S5 XL Ion ReproSeq TM results Cell line Ploidy Start (Mbp) End (Mbp) Size (Mbp) Confidence Precision MAPD GM05067 (male) 9 3 0 39.7 39.7 7.1 5.7 0.200 9 3 0 39.7 39.7 8.4 6.3 0.162 9 3 0 39.7 39.7 5.5 5.5 0.180 9 3 0 39.7 39.7 7.1 6.6 0.191 9 3 0 39.7 39.7 8.9 5.5 0.171 9 3 0 39.7 39.7 9.9 5.0 0.177 9 3 0 39.7 39.7 7.7 5.5 0.199 9 3 0 39.7 39.7 7.8 6.5 0.184 GM12606 (male) 13 3 19 60.9 41.8 4.7 4.7 0.178 13 3 19 54.9 35.9 4.0 4.0 0.211 13 3 19 60.9 41.8 5.8 5.8 0.161 13 3 19 58.9 39.8 6.0 6.0 0.174 13 3 19 60.9 41.8 6.1 6.1 0.186 13 3 19 60.9 41.8 5.7 5.7 0.177 13 3 19 58.9 39.8 2.9 2.9 0.175 13 3 19 60.9 41.8 7.1 6.0 0.211 GM14164 (female) 13 1 46.9 94.7 47.8 14.7 14.7 0.163 13 1 48.9 94.7 45.8 16.9 16.9 0.188 13 1 46.9 94.7 47.8 8.4 8.4 0.231 13 1 48.9 94.7 45.8 10.3 10.3 0.182 13 1 46.9 94.7 47.8 10.4 10.4 0.224 13 1 46.9 94.7 47.8 9.4 9.4 0.201 13 1 46.9 94.7 47.8 14.8 14.8 0.172 13 1 48.9 94.7 45.8 7.9 7.9 0.201 Figure 5. CNVs were correctly identified by Ion Reporter™ Software for single cell samples from three cell lines. The three cells lines tested were GM05067 (male, 45Mbp duplication on 9), GM12606 (male, 41Mbp duplication on 13), and GM14164 (female, 48Mbp deletion on 13). These events are the same size or smaller than an aneuploidy of the smallest chromosome, 21. A) Three representative plots from Ion Reporter™ Software are shown for each cell line. Dots represent log2 ratios of normalized counts for ~2Mbp tiles from the sequenced sample compared to a composite informatics baseline from 36 normal libraries. CNVs are called automatically by the software and shown as copy number increases (blue bars) or decreases (red bars). In all cases the gender called matched the known gender for the cell line. B) A table of calls for the full set of samples. Correct calls were made in 24/24 cases with no false positives (confidence > 0). The boundaries of the CNVs were identified within 6Mbp of expectation. MAPD values represent the Median Average Pairwise Distance between log 2 ratios of adjacent ~2Mbp tiles. A) B) RESULTS Analysis of accuracy with decreased flows show that we still achieve raw read accuracy >99.5% with Ion S5 XL TM at 300 and 200 flows (Figure 2). 0 0.2 0.4 0.6 0.8 Time (Hour) Block Fast S5 XL sequencing: Sample-to-data in ~6.5 to 8 hr Bacterial Nucleic acid Human Single cell genomic DNA Isothermal Amplification Sample Templating Ion S5 Sequencing & TS analysis Ion Reporter 200 flows (55 min) 250 flows (70 min) N/A ~ 85 min AmpliSeq based 16S + ID amplicons WGA with ReproSeq™ aneuploidy kit Library Prep 3.5 hr 2 hr 50-85 min 55-70 min Total time ~ 6.5 ~ 8 hr Figure 5. Near real-time sequencing on the Ion S5™ XL. Total run time from the beginning of run to the end of BAM output is 51 minutes (0.85 hr) for a 200-flow run. On-instrument and secondary data analysis time breakdowns are shown for 24 blocks. The Phase Estimation step of Base Calling is moved to On-Instrument Analysis (OIA) to allow near real-time base calling and shorten total analysis time. Figure 1. Decreasing overall sequencing workflow time on the Ion S5™ XL sequencing system. Improvements in library preparation, clonal amplification and optimizing sequencing reagent flows on the S5 system provide a dramatic decrease in overall sequencing workflow time. For Ion AmpliSeq™-based libraries such as the pan-bacterial ID panel sequencing and data analysis can be completed in about 6.5 hrs. For whole genome libraries produced from single human cells by whole genome amplification (WGA), the sequencing workflow (from library prep to data) can be completed in approximately 8 hrs. Figure 2. Read quality of Ion S5 XL TM with reduced flow number is comparable to that of Ion PGM. PGM, 318 chip 500 flows S5 XL 300 flows S5 XL 200 flows

Upload: trandang

Post on 21-Apr-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speeding up the Ion S5 XL sequencing system: … are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. Rongsu Qi1 , Chaitali Parikh

Figure 4. Reducing flow number to 200 does not reduce

total number of reads or accuracy of species ID and 16S

detection. We achieved ~100% specificity with species ID

targets at180 flows. However, 16S identification specificity

drops at 180 flows. From the results above, we concluded

that 200 is the minimum number of flows without

compromising detection accuracy.

INTRODUCTION

The Ion S5™ XL Sequencer currently sequences 200 bp in

2.5 hrs with about 1 hr analysis time. Here we focused on

optimization of the sequencing and data analysis workflow

in an effort to reduce total turn around time. Since it largely

depends on and the number of reagent flows (one flow

produces ~0.5 base) and the speed of the flows, we

improved the flow speed and utilized the minimum number

of flows optimized for the length of the amplicons. We used

an existing streamlined library preparation workflow based

on Ion AmpliSeq™, clonal amplification (templating) with

Isothermal Amplification, and the new rapid Ion S5™ XL

sequencing protocol.

With the new fast flow protocol on the Ion S5™ XL

Sequencer, we achieved sequencing and Torrent Suite

analysis in as low as 55 minutes (with 200 flows).

The new rapid approach was applied to two different

application types (and library preparation protocols) that

have the potential to benefit from rapid sequencing turn-

around times.

1. Based on the Ion AmpliSeq™ highly-multiplexed PCR

approach, a rapid infectious disease identification and

antibiotic susceptibility panel designed to sequence 16S

and strain and antimicrobial gene-specific regions.

2. A low coverage whole genome sequencing approach

for preimplantation aneuploidy screening, the Ion

ReproSeq™ PGS kit enables screening of aneuploidy

in all 24 chromosomes to help improve implantation

success rates.

MATERIALS AND METHODS

Bacterial identification with Ion AmpliSeq™

assay:

Total nucleic acids from six bacterial cultures (Acinetobacter

baumannii, Enterobacter cloacae, Enterococcus faecium,

Klebsiella pneumoniae, Pseudomonas aeruginosa,

Staphylococcus aureus) were extracted as input for

targeted sequencing. Ion AmpliSeq™ libraries were

constructed and templated onto Ion Sphere Particles using

isothermal amplification reactions.

To assess the accuracy of detection with fewer flows, we

conducted the runs on the Ion S5™ XL using 500 flows,

then simulated runs with fewer flows through reanalysis

with the Analysis option: --flow-limit. We counted total

number of reads, number of reads assigned to each target

through the species ID amplicons, and number of reads

assigned to each target through 16S amplicons. Analysis

was performed with a custom TS plugin.

Ion ReproSeq™ for single cell aneuploidy:

Libraries were generated from single cell samples using

SingleSeq™ whole genome amplification (WGA). Eight

libraries were selected to represent each of three cell lines

with known copy number variations (CNVs) comprising

regions of 40-50 Mbp, similar in size to aneuploidies for the

smallest chromosomes. The barcoded libraries were

pooled, templated using isothermal amplification, and

sequenced for 250 flows on an Ion S5™ XL with the fast

run mode. Data were uploaded and analyzed on an Ion

Reporter™ Server with duplicate reads removed.

CONCLUSIONS

With the Ion S5™ XL Sequencer and modifications to

the on-instrument flow times and total number of

flows, we can reduce the time for sequencing and

accurate data analysis for applications requiring rapid

turn around time. For AmpliSeq and ReproSeq assays

shown, sequencing and Torrent Suite analysis could

be completed in 55-75 minutes. Paired with fast

library and template preparation, the total turn around

time was at or below a standard workday.

ACKNOWLEDGEMENTS Bacterial ID AmpliSeq™ assay:

Kunal Banjara, Jamsheed Ghadiri, Andrew Hutchison, Peter

Vander Horn, Karen Clyde, Nisha Mulakken, Rajesh

Gottimukkala, Diana Jeon, and Simon Cawley

Ion ReproSeq™ assay:

Mark Andersen and Rob Bennett.

Torrent Suite Software:

Dominique Belhachemi, Christian Koller, and Mohit Gupta

For Research Use Only. Not for use in diagnostic

procedures

© 2016 Thermo Fisher Scientific, Inc. All rights reserved. All

trademarks are the property of Thermo Fisher Scientific and

its subsidiaries unless otherwise specified.

Rongsu Qi1, Chaitali Parikh1, David Mandelman2, Haythem Latif2, Adam Harris2 , and Srinka Ghosh1

1Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA 94080, USA 2Thermo Fisher Scientific, 5791 Van Allen Way, Carlsbad, CA 92008, USA

Speeding up the Ion S5™ XL sequencing system: Sequencing in an hour

enables sample to answer in a 8 hr workday

As the majority of the amplicons for the AmpliSeq

panel have a mean insert length of 95 bp; we

estimated the minimum number of flows needed as

200. Analysis of a S5 XL run (with 500 flows) at 300,

250, 200 and 180 flows showed that at 200 flows,

the read length distribution is comparable to 500

flows. At 180 flows, read length distribution is

narrowed and reads are shortened (Figure 3).

Figure 3. Observed read length distribution at 200 flows

agrees with 500 flows and the expected amplicon insert

length distribution.

180 flows 200 flows 250 flows 250 flows 300 flows

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

E. clo S. aur K. pne A. bau P.aer E. fae

#Flows Sample

Total number of reads

% Reads mapped to different species ID targets

Acinetobacter baumannii Enterobacter cloacae Enterococcus faecium Klebsiella pneumoniae Pseudomonas aeruginosa Staphylococcus aureus

% Reads mapped to different 16S families Bacillaceae Enterobacteriaceae Enterococcaceae Lactobacillaceae Moraxellaceae Planococcaceae Pseudomonadaceae Staphylococcaceae Vibrionaceae

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

E. clo S. aur K. pne A. bau P.aer E. fae

#Flows Sample

Assuming

amplicons are

at equal

concentration

Length

Fre

qu

en

cy

500 flows

300 flows 250 flows

200 flows 180 flows

Expected

95 bp Mean

Fast S5 XL with Bacterial ID AmpliSeqTM results

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

18

0

20

0

25

0

30

0

50

0

E. clo S. aur K. pne A. bau P.aer E. fae

#Flows Sample

Fast S5 XL Ion ReproSeqTM results

Cell line Ploidy Start

(Mbp)

End

(Mbp)

Size

(Mbp) Confidence Precision MAPD

GM

0506

7

(male

)

9 3 0 39.7 39.7 7.1 5.7 0.200

9 3 0 39.7 39.7 8.4 6.3 0.162

9 3 0 39.7 39.7 5.5 5.5 0.180

9 3 0 39.7 39.7 7.1 6.6 0.191

9 3 0 39.7 39.7 8.9 5.5 0.171

9 3 0 39.7 39.7 9.9 5.0 0.177

9 3 0 39.7 39.7 7.7 5.5 0.199

9 3 0 39.7 39.7 7.8 6.5 0.184

GM

1260

6

(male

)

13 3 19 60.9 41.8 4.7 4.7 0.178

13 3 19 54.9 35.9 4.0 4.0 0.211

13 3 19 60.9 41.8 5.8 5.8 0.161

13 3 19 58.9 39.8 6.0 6.0 0.174

13 3 19 60.9 41.8 6.1 6.1 0.186

13 3 19 60.9 41.8 5.7 5.7 0.177

13 3 19 58.9 39.8 2.9 2.9 0.175

13 3 19 60.9 41.8 7.1 6.0 0.211

GM

1416

4

(fem

ale

)

13 1 46.9 94.7 47.8 14.7 14.7 0.163

13 1 48.9 94.7 45.8 16.9 16.9 0.188

13 1 46.9 94.7 47.8 8.4 8.4 0.231

13 1 48.9 94.7 45.8 10.3 10.3 0.182

13 1 46.9 94.7 47.8 10.4 10.4 0.224

13 1 46.9 94.7 47.8 9.4 9.4 0.201

13 1 46.9 94.7 47.8 14.8 14.8 0.172

13 1 48.9 94.7 45.8 7.9 7.9 0.201

Figure 5. CNVs were correctly identified by Ion

Reporter™ Software for single cell samples from three

cell lines. The three cells lines tested were GM05067 (male,

45Mbp duplication on 9), GM12606 (male, 41Mbp duplication

on 13), and GM14164 (female, 48Mbp deletion on 13).

These events are the same size or smaller than an

aneuploidy of the smallest chromosome, 21.

A) Three representative plots from Ion Reporter™ Software

are shown for each cell line. Dots represent log2 ratios of

normalized counts for ~2Mbp tiles from the sequenced

sample compared to a composite informatics baseline from

36 normal libraries. CNVs are called automatically by the

software and shown as copy number increases (blue bars) or

decreases (red bars). In all cases the gender called matched

the known gender for the cell line.

B) A table of calls for the full set of samples. Correct calls

were made in 24/24 cases with no false positives (confidence

> 0). The boundaries of the CNVs were identified within

6Mbp of expectation. MAPD values represent the Median

Average Pairwise Distance between log2 ratios of adjacent

~2Mbp tiles.

A)

B)

RESULTS

Analysis of accuracy with decreased flows show that

we still achieve raw read accuracy >99.5% with Ion

S5 XLTM at 300 and 200 flows (Figure 2).

0 0.2 0.4 0.6 0.8

Time (Hour)

Blo

ck

Fast S5 XL sequencing: Sample-to-data in ~6.5 to 8 hr

Bacterial Nucleic acid

Human Single cell genomic DNA

Isothermal Amplification

Sample Templating Ion S5 Sequencing & TS analysis

Ion Reporter

200 flows (55 min)

250 flows (70 min)

N/A

~ 85 min

AmpliSeq based 16S + ID amplicons

WGA with ReproSeq™

aneuploidy kit

Library Prep

3.5 hr 2 hr 50-85 min 55-70 min

Total time

~ 6.5

~ 8 hr

Figure 5. Near real-time sequencing on the Ion S5™ XL.

Total run time from the beginning of run to the end of BAM

output is 51 minutes (0.85 hr) for a 200-flow run. On-instrument

and secondary data analysis time breakdowns are shown for

24 blocks. The Phase Estimation step of Base Calling is moved

to On-Instrument Analysis (OIA) to allow near real-time base

calling and shorten total analysis time.

Figure 1. Decreasing overall sequencing workflow time on the Ion S5™ XL sequencing system. Improvements in library preparation, clonal amplification and

optimizing sequencing reagent flows on the S5 system provide a dramatic decrease in overall sequencing workflow time. For Ion AmpliSeq™-based libraries such as the pan-bacterial

ID panel sequencing and data analysis can be completed in about 6.5 hrs. For whole genome libraries produced from single human cells by whole genome amplification (WGA), the

sequencing workflow (from library prep to data) can be completed in approximately 8 hrs.

Figure 2. Read quality of Ion S5 XLTM with reduced flow

number is comparable to that of Ion PGM.

PGM, 318 chip

500 flows

S5 XL

300 flows

S5 XL

200 flows