application of long‐read sequencing in microbial...
TRANSCRIPT
Application of long‐read sequencing in microbial
genomicsLeah Roberts
Sakzewski Senior ScientistCentre for Children’s Health Research
CCHR supervisor:
A/Prof David Whiley
PhD supervisors (SCMB):
A/Prof Scott Beatson
Prof Mark Schembri
What is microbial genomics
Archaea, Bacteria, Viruses, Microbial Eukaryotes
Environmental Clinical
Industry
Soil/permafrostOcean/coralsWildlife
Water treatmentFood industryBiotech
DiagnosisOutbreaksMicrobiota
Archaea, Bacteria, Viruses, Microbial Eukaryotes
Whole genome sequencing –high resolution diagnostics for microbiology
• Ability to discriminate at the nucleotide level• Allows for the highest comparative resolution
• Whole genome characterisation• Plasmids• Antibiotic resistance genes (and their context)
• Virulence genes
• Becoming more applicable for the clinic• Technology becoming faster, more accessible and more cost effective
https://doi.org/10.1111/1469‐0691.12217 (2013)
Exponential growth of sequencing data
Why is read length important?• Generating complete genomes
• Traversing repeat regions in genomes
Repeat 1 Repeat 2
Scenario 1: short read (Illumina) sequencing
A B C
Repeat 1 + 2A B C
Why is read length important?• Generating complete genomes
• Traversing repeat regions in genomes
Repeat 1 Repeat 2
Scenario 2: long read (PacBio or Nanopore) sequencing
A B C
Repeat 1 Repeat 2A B C
Why is read length important?Scenario 2:
Long read sequencing
• Produce draft genomes • Produce complete genomes
Scenario 1: short read sequencing
https://github.com/rrwick/Bandage/wiki/Effect‐of‐kmer‐size
Chromosomeand
plasmids
Virulence and antibiotic resistance regions are often associated with repeat regions
Long reads allow you to completely understand the context and structure of the
entire genome
Gene‐centric Whole genome sequencing(short reads)
Whole genome sequencing(long reads)
The puzzle solved!
• Pacific Biosciences Single Molecule Real‐Time (SMRT) sequencing
• Oxford Nanopore MinION
Third Generation “Long‐read sequencing”
Third Generation – PacBio SMRT Library preparation
https://doi.org/10.1186/s40793‐017‐0239‐1
Library preparation:
“SMRTbelltemplate”
Third Generation – PacBio SMRT Zero Mode Waveguide (ZMW)
• SMRT cell• 1 million ZMWs
• Library loaded into SMRT cell using Magbeads
• Each ZMW has a single polymerase at the base
https://www.cd‐genomics.com/pacbio‐smrt‐sequencing.htmlhttp://dnatech.genomecenter.ucdavis.edu/pacbio‐sequencing/
DOI10.1038/nrg2626
Third Generation – PacBio SMRT Sequencing
Principles of SequencingThird Generation – PacBio SMRT sequencing
• PacBio SMRT sequencing applications:• 5‐8 Gb throughput/SMRT cell• 85‐89% base accuracy*
• With enough coverage, consensus accuracy exceeds Illumina>99.99%
• Some problems with homopolymers• Expensive (~$2000 AUD bacterial genome)
• Multiplexing on the sequel now available• Reliable and reproducible• Can detect epigenetic modifications: methylation
*https://doi.org/10.1016/j.ygeno.2017.12.011
Principles of SequencingThird Generation – Nanopore MinION
1D sequencing by ligation: Rapid sequencing:
https://store.nanoporetech.com/
Nanopore MinION – flow cell
Nanopore = nano‐scale hole
Ionic current passes through nanoporesand measures the changes in current
Current change can be used to identify that molecule
https://nanoporetech.com/how‐it‐workshttp://biochemistri.es/post/119865709426/of‐nanopores‐and‐isoforms
Principles of SequencingThird Generation – Nanopore MinION
• Nanopore MinION sequencing applications:• Read accuracy:
• ~<80% accuracy (2015)• ~92% accuracy 1D chemistry (2018)• ~94‐96% accuracy flip‐flop algorithm (2019)
• Consensus accuracy >99.9%• 80% of errors systematic (e.g. homopolymers [AAAAA, TTTTT])• Requires Illumina polishing for SNPs and indels• Increased accuracy with new R10 flow cell
https://doi.org/10.1016/j.bdq.2015.02.001
• Q50 = 99.999% accuracy = 1 error/100000 bases
R10 flow cell
Principles of SequencingThird Generation – Nanopore MinION
• Nanopore MinION sequencing applications:• Throughput:
• 20 ‐30 Gb/flow cell (2018‐2019)• Current record = 50 Gb
• 1 Flow cell ~$1000 (AUD)• Multiplexing and repeat use options available
• Capacity for real‐time analysis
Application of long‐readsTackling mobile genetic elements: 1. Resolving plasmids and highly repetitive regions2. Resolving large tandem repeats
Rapid clinical response: 1. Nanopore for real‐time analysis 2. Sequencing of clinical samples
Metagenomics:1. Environment
June July AugustMay November2015
A
Pathology Results:
• Enterobacter cloacae• Carbapenemase IMP‐4
(PCR)
Where did this infection come from?
How similar are these isolates?
What is the context of the IMP gene?• Chromosome?• Plasmid?• Integron?
Outbreak confirmed using WGS
Antibiotic resistance genes (~30 contigs)
Plasmid
Resolving highly repetitive regions difficult with short‐read data
~55 kb MDR region:
19/21 antibiotic resistance genes
PacBio sequencing reveals large ~330 kb IncHI2 plasmid carrying blaIMP4
IncHI2 plasmid from Sydney identical
Application of long‐readsTackling mobile genetic elements: 1. Resolving plasmids and highly repetitive regions2. Resolving large tandem repeats
Rapid clinical response: 1. Nanopore for real‐time analysis 2. Sequencing of clinical samples
Metagenomics:1. Environment
“Traditional” diagnostics
• Standardised, established methods and infrastructure, reasonably fast turn‐around time
• Relies on phenotype• Lacks high‐resolutiondiscriminatory power
• Hard to culture orunculturable microbes
For septic shock patients, survival rate decreases by 7.6 % for every hour delay in appropriate therapy
Kumar et al., Crit Care Med. 2006 Jun;34(6):1589‐96.
Nanopore for real‐time analysis• Great for species ID
• Sterile sites/Hard to grow organisms
Only 17 reads from Nanoporesequencing
1 day Nanoporesequencing
Confirmed from blood culture and
dog’s mouth
Detection of Chikungunya virus (CHIKV), Ebola virus (EBOV) and hepatitis C virus (HCV) from
human blood samples
First detection in <10 minutes<6 hours for whole process
Prosthetic joint infections difficult to diagnose – culture remains gold standard but only 65% of causative bacteria detected
Showed concordance of results between MinION and Miseq/standard
culture results.
Variable detection of species abundance in polymicrobial samples between high and low GC genomes
between MinION and MiSeq
bases
Sequencing time
Sequencing reads for species appear within <20 min
Long reads were better at detecting low‐abundance species:
• Unrelated to GC%• PCR step in Illumina
biases towards more abundant species
• Long reads eliminates possible false‐positive matches and miss‐classifications
Reference‐free detection of species:
Nanopore for real‐time analysis• Great for species ID
• Sterile sites/Hard to grow organisms
• Need to be aware of read error rate• Strain specific typing may require more consideration• E.g. outbreak investigations
But! Long‐reads can help you identify large changes associated with specific lineages
Reference strains +
Nanopore strains
With context
Context shouldbe approriate!
Application of long‐readsTackling mobile genetic elements: 1. Resolving plasmids and highly repetitive regions2. Resolving large tandem repeats
Rapid clinical response: 1. Nanopore for real‐time analysis 2. Sequencing of clinical samples
Metagenomics:1. Environment
Environmental:
Low‐complexity sample without amplification
High‐complexity sample with amplification
Problems encountered: • High rates of sequence carry‐over in
repeat‐use flow cells• Index switching rate of 3.6‐3.8%• High error rates led to artificial
richness (caused by mapping to other closely related species)
• Homopolymer‐rich species were two orders of magnitude less in MinIONdata compared to the Sequel
@Puntseq:
Symptoms include
“swimmer’s itch”, fever after
swallowing water and wound infections
Looking at infections
attributed to Cam water contact
Other considerations...
Sample requirements?
https://www.pacb.com/wp‐content/uploads/2015/09/Guide‐Pacific‐Biosciences‐Template‐Preparation‐and‐Sequencing.pdf
Illumina: • Flexible DNA requirements• Low input kits available
PacBio SMRT sequencing: • Require ~20 ug of DNA
• ~6 x 5ml overnight broth cultures• ~6 x agar plates • Low input kits available
• Must be high molecular weight• Aim for ~20‐40 kb
• Must be high quality DNA
Nanopore MinION:• Requires 1.5ug DNA
• Low input kits available • Ideal to be high molecular weight
Other logistical problems to consider for the clinic
PacBio RSIIIllumina NextSeq
Nanopore MinION
The future is now!
Illumina MiniSeq
PacBio Sequel
PromethION GridION
SmidgION Flongle
Long reads from short reads?• Morphoseq – specific mutations
• 10X Chromium – barcoded short reads
Long reads from short reads?
AcknowledgementsUQ SCMB• A/Prof Scott Beatson• Prof Mark Schembri• Dr Brian Forde
Clinical collaborators• Patrick Harris• David Paterson• Haakon Bergh
Children’s Health QLD• A/Prof David Whiley• Dr Adam Irwin• Dr Julia Clark
Diphtheria project• Andrew Henderson• Geoffrey Playford• David Looke• Belinda Henderson• Catherine Watson• Gordon Laurie• Hanna Sidjabat• Graeme Nimmo• Sharmini Muttiah• Guy Lampe• Helen Smith• Brad McCall• Heidi Carrol• Matthew Cooper• Jason Steen
Special acknowledgements:To the family of the patient for consenting to us communicating this research
@loolibear@beatsonlab