application of long‐read sequencing in microbial...

Post on 27-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Application of long‐read sequencing in microbial 

genomicsLeah Roberts

Sakzewski Senior ScientistCentre for Children’s Health Research

CCHR supervisor:

A/Prof David Whiley

PhD supervisors (SCMB):

A/Prof Scott Beatson

Prof Mark Schembri

What is microbial genomics

Archaea, Bacteria, Viruses, Microbial Eukaryotes

Environmental  Clinical 

Industry 

Soil/permafrostOcean/coralsWildlife

Water treatmentFood industryBiotech

DiagnosisOutbreaksMicrobiota

Archaea, Bacteria, Viruses, Microbial Eukaryotes

Whole genome sequencing –high resolution diagnostics for microbiology

• Ability to discriminate at the nucleotide level• Allows for the highest comparative resolution

• Whole genome characterisation• Plasmids• Antibiotic resistance genes (and their context)

• Virulence genes

• Becoming more applicable for the clinic• Technology becoming faster, more accessible and more cost effective

https://doi.org/10.1111/1469‐0691.12217 (2013)

Exponential growth of sequencing data 

Why is read length important?• Generating complete genomes

• Traversing repeat regions in genomes

Repeat 1 Repeat 2

Scenario 1: short read (Illumina) sequencing

A B C

Repeat 1 + 2A B C

Why is read length important?• Generating complete genomes

• Traversing repeat regions in genomes

Repeat 1 Repeat 2

Scenario 2: long read (PacBio or Nanopore) sequencing

A B C

Repeat 1 Repeat 2A B C

Why is read length important?Scenario 2:

Long read sequencing

• Produce draft genomes • Produce complete genomes

Scenario 1: short read sequencing

https://github.com/rrwick/Bandage/wiki/Effect‐of‐kmer‐size

Chromosomeand

plasmids

Virulence and antibiotic resistance regions are often associated with repeat regions

Long reads allow you to completely understand the context and structure of the 

entire genome

Gene‐centric Whole genome sequencing(short reads)

Whole genome sequencing(long reads)

The puzzle solved!

• Pacific Biosciences Single Molecule Real‐Time (SMRT) sequencing

• Oxford Nanopore MinION

Third Generation “Long‐read sequencing”

Third Generation – PacBio SMRT Library preparation

https://doi.org/10.1186/s40793‐017‐0239‐1

Library preparation:

“SMRTbelltemplate”

Third Generation – PacBio SMRT Zero Mode Waveguide (ZMW)

• SMRT cell• 1 million ZMWs

• Library loaded into SMRT cell using Magbeads

• Each ZMW has a single polymerase at the base

https://www.cd‐genomics.com/pacbio‐smrt‐sequencing.htmlhttp://dnatech.genomecenter.ucdavis.edu/pacbio‐sequencing/

DOI10.1038/nrg2626

Third Generation – PacBio SMRT Sequencing

Principles of SequencingThird Generation – PacBio SMRT sequencing

• PacBio SMRT sequencing applications:• 5‐8 Gb throughput/SMRT cell• 85‐89% base accuracy*

• With enough coverage, consensus accuracy exceeds Illumina>99.99%

• Some problems with homopolymers• Expensive (~$2000 AUD bacterial genome)

• Multiplexing on the sequel now available• Reliable and reproducible• Can detect epigenetic modifications: methylation

*https://doi.org/10.1016/j.ygeno.2017.12.011

Principles of SequencingThird Generation – Nanopore MinION

1D sequencing by ligation: Rapid sequencing:

https://store.nanoporetech.com/

Nanopore MinION – flow cell

Nanopore = nano‐scale hole

Ionic current passes through nanoporesand measures the changes in current

Current change can be used to identify that molecule

https://nanoporetech.com/how‐it‐workshttp://biochemistri.es/post/119865709426/of‐nanopores‐and‐isoforms

Principles of SequencingThird Generation – Nanopore MinION

• Nanopore MinION sequencing applications:• Read accuracy:

• ~<80% accuracy (2015)• ~92% accuracy 1D chemistry (2018)• ~94‐96% accuracy flip‐flop algorithm (2019)

• Consensus accuracy >99.9%• 80% of errors systematic (e.g. homopolymers [AAAAA, TTTTT])• Requires Illumina polishing for SNPs and indels• Increased accuracy with new R10 flow cell

https://doi.org/10.1016/j.bdq.2015.02.001

• Q50 = 99.999% accuracy = 1 error/100000 bases

R10 flow cell

Principles of SequencingThird Generation – Nanopore MinION

• Nanopore MinION sequencing applications:• Throughput:

• 20 ‐30 Gb/flow cell (2018‐2019)• Current record = 50 Gb

• 1 Flow cell ~$1000 (AUD)• Multiplexing and repeat use options available

• Capacity for real‐time analysis

Application of long‐readsTackling mobile genetic elements: 1. Resolving plasmids and highly repetitive regions2. Resolving large tandem repeats

Rapid clinical response: 1. Nanopore for real‐time analysis 2. Sequencing of clinical samples

Metagenomics:1. Environment

June July AugustMay November2015

A

Pathology Results:

• Enterobacter cloacae• Carbapenemase IMP‐4 

(PCR)

Where did this infection come from?

How similar are these isolates?

What is the context of the IMP gene?• Chromosome?• Plasmid?• Integron?

Outbreak confirmed  using WGS

Antibiotic resistance genes (~30 contigs)

Plasmid

Resolving highly repetitive regions difficult with short‐read data

~55 kb MDR region:

19/21 antibiotic resistance genes

PacBio sequencing reveals large ~330 kb IncHI2 plasmid carrying blaIMP4

IncHI2 plasmid from Sydney identical

Application of long‐readsTackling mobile genetic elements: 1. Resolving plasmids and highly repetitive regions2. Resolving large tandem repeats

Rapid clinical response: 1. Nanopore for real‐time analysis 2. Sequencing of clinical samples

Metagenomics:1. Environment

“Traditional” diagnostics

• Standardised, established methods and infrastructure, reasonably fast turn‐around time

• Relies on phenotype• Lacks high‐resolutiondiscriminatory power

• Hard to culture orunculturable microbes

For septic shock patients, survival rate decreases by 7.6 % for every hour delay in appropriate therapy

Kumar et al., Crit Care Med. 2006 Jun;34(6):1589‐96.

Nanopore for real‐time analysis• Great for species ID

• Sterile sites/Hard to grow organisms

Only 17 reads from Nanoporesequencing

1 day Nanoporesequencing

Confirmed from blood culture and 

dog’s mouth

Detection of Chikungunya virus (CHIKV), Ebola virus (EBOV) and hepatitis C virus (HCV) from 

human blood samples

First detection in <10 minutes<6 hours for whole process

Prosthetic joint infections difficult to diagnose – culture remains gold standard but only 65% of causative bacteria detected

Showed concordance of results between MinION and Miseq/standard 

culture results.  

Variable detection of species abundance in polymicrobial samples between high and low GC genomes 

between MinION and MiSeq

bases

Sequencing time

Sequencing reads for species appear within <20 min

Long reads were better at detecting low‐abundance species:

• Unrelated to GC%• PCR step in Illumina 

biases towards more abundant species

• Long reads eliminates possible false‐positive matches and miss‐classifications

Reference‐free detection of species: 

Nanopore for real‐time analysis• Great for species ID

• Sterile sites/Hard to grow organisms

• Need to be aware of read error rate• Strain specific typing may require more consideration• E.g. outbreak investigations

But! Long‐reads can help you identify large changes associated with specific lineages

Reference strains + 

Nanopore strains

With context

Context shouldbe approriate!

Application of long‐readsTackling mobile genetic elements: 1. Resolving plasmids and highly repetitive regions2. Resolving large tandem repeats

Rapid clinical response: 1. Nanopore for real‐time analysis 2. Sequencing of clinical samples

Metagenomics:1. Environment

Environmental:

Low‐complexity sample without amplification 

High‐complexity sample with amplification

Problems encountered: • High rates of sequence carry‐over in 

repeat‐use flow cells• Index switching rate of 3.6‐3.8%• High error rates led to artificial 

richness (caused by mapping to other closely related species)

• Homopolymer‐rich species were two orders of magnitude less in MinIONdata compared to the Sequel

@Puntseq: 

Symptoms include 

“swimmer’s itch”, fever after 

swallowing water and wound infections

Looking at infections 

attributed to Cam water contact

Other considerations...

Sample requirements?

https://www.pacb.com/wp‐content/uploads/2015/09/Guide‐Pacific‐Biosciences‐Template‐Preparation‐and‐Sequencing.pdf

Illumina: • Flexible DNA requirements• Low input kits available

PacBio SMRT sequencing: • Require ~20 ug of DNA

• ~6 x 5ml overnight broth cultures• ~6 x agar plates • Low input kits available

• Must be high molecular weight• Aim for ~20‐40 kb

• Must be high quality DNA

Nanopore MinION:• Requires 1.5ug DNA

• Low input kits available • Ideal to be high molecular weight

Other logistical problems to consider for the clinic

PacBio RSIIIllumina NextSeq

Nanopore MinION

The future is now!

Illumina MiniSeq

PacBio Sequel

PromethION GridION

SmidgION Flongle

Long reads from short reads?• Morphoseq – specific mutations

• 10X Chromium – barcoded short reads

Long reads from short reads?

AcknowledgementsUQ SCMB• A/Prof Scott Beatson• Prof Mark Schembri• Dr Brian Forde

Clinical collaborators• Patrick Harris• David Paterson• Haakon Bergh

Children’s Health QLD• A/Prof David Whiley• Dr Adam Irwin• Dr Julia Clark

Diphtheria project• Andrew Henderson• Geoffrey Playford• David Looke• Belinda Henderson• Catherine Watson• Gordon Laurie• Hanna Sidjabat• Graeme Nimmo• Sharmini Muttiah• Guy Lampe• Helen Smith• Brad McCall• Heidi Carrol• Matthew Cooper• Jason Steen

Special acknowledgements:To the family of the patient for consenting to us communicating this research

@loolibear@beatsonlab

top related