the rise of whole genome sequencing as a subtyping … · 2019-04-29 · steak expert meeting:...

STEAK EXPERT MEETING: ANGERS FRANCE JUNE, 2015

THE RISE OF WHOLE GENOME SEQUENCING AS A SUBTYPING TOOL FOR MICROBIAL SOURCE

TRACKING: FROM FUNDAMENTALS TO APPLICATIONS

Kendra Nightingale, Ph.D.

International Center for Food Industry Excellence

Department of Animal & Food Sciences

Texas Tech University

SUBTYPING OF FOODBORNE PATHOGEN ISOLATES

• Ability to differentiate isolates belonging to food borne pathogen beyond the species sub-species level

• Considerations for interpretation of bacterial subtyping or “DNA fingerprint” typing data:

• Goal of human DNA subtyping = identify a single specific individual• Human DNA subtypes are unique• Bacterial subtypes are not

• Goal of bacterial DNA subtyping = determine if isolates share a recent common ancestor

BIOLOGY 101

DNA

mRNA

Protein/Enzymes

Toxins and Other Metabolites

MolecularMethods

PhenotypicMethods

DNA Replication

Translation

Transcription

MICROBIOLOGY 101

MICROBIAL GENETICS 101

• What types of DNA molecules are present in a bacterial cell?

• What’s the size of the genetic material for a typical bacterial pathogen?

• How many genes does a bacterial pathogen have?

• What’s the average size of a bacterial gene?


Generation 1

Generation 2

Generation 3

Generation N

Ancestor Genotype

Clones

Clones

Clones and Divergent Genotypes

Mutation ?Bacterial mutation rate ?


• Mutations

• Point mutations • Frameshift mutations• Inversions• Insertions/deletions (indels), including duplications

• Horizontal gene transfer

• Horizontal gene transfer of homologous gene sequences

• Horizontal gene transfer of non-homologous gene sequences

• Plasmid loss or acquisition

SUBTYPING OF FOODBORNE PATHOGEN ISOLATES

• Subtyping tools to detect disease outbreaks and identify food source

• Phenotypic subtyping• Serotyping, phage typing, biotyping

• Molecular subtyping • Band-based

• PFGE, ribotyping, REP-PCR• DNA sequence-based

• Allele, multilocus sequence typing (MLST), multiple-locus variable number tandem repeat analysis (MLVA), genome

• Molecular subtyping methods more discriminatory & reproducible • Whole genome sequencing will facilitate disease outbreak

detection and outbreak investigations to identify food source

SUBTYPING & OUTBREAK INVESTIGATIONS

Exposure to Pathogen in Food

Human infection

Isolation from sterile body fluid using non-selective blood plates

Food Storage

Isolation using selective enrichment and

selective and differential agar media

Subtyping of Isolate Subtyping of Isolates

ESTABLISHMENT OF SUBTYPING NETWORK

• PulseNet established in the U.S. in 1996: turning po int for routine PFGE typing of bacterial foodborne disease surveill ance from clinical cases

• Initially focused on Escherichia coli O157:H7 but expanded to other pathogens (i.e., Campylobacter jejuni, Clostridium botulinum, Cronobacter, Listeria monocytogenes, Salmonella, and Shigella)

• Expanded to food and environmental isolates• Expanded internationally to “PulseNet International”

• Development and implementation of standardized prot ocols and rapid Web-based exchange of resultant pattern data

• Facilitated detection of temporally and spatially d istributed foodborne disease outbreaks & outbreak investigatio n

• PulseNet adapted for whole genome sequences and anal yses of resultant data

• Changing the shape of epidemic curves (i.e., reduci ng noise and expanding time frame of an outbreak)

Image provided by Peter Gerner-Smidt, Centers for Disease Control and Prevention

TEMPORALLY AND GEOGRAPHICALYDISPERSED OUTBREAK DETECTION

• Two more cases of the same illness where investigation shows illness came from same food or drink

• In 2014, the Centers for Disease Control & Prevention monitored between 20 and 40 potential related clusters of illness weekly and investigated >220 multistate clusters

• Investigations led to identification of 68 confirme d or suspected vehicles and recall of variety of foods

• Molecular subtyping and surveillance network to identify clusters of cases caused by the same strai n

• Mid-September, 2006 • CDC alerted about clusters of E.

coli O157:H7 illness in northwest• CDC PulseNet confirmed cases

caused by same PFGE type

• End of September, 2006 • 206 persons infected with outbreak

strain in 26 states• 52% hospitalized; 15% HUS; three

deaths• 95% reported eating fresh spinach

within 10 d before illness onset

• Trace-back investigation• E. coli O157:H7 matching outbreak

PFGE type isolated from cattle on ranch nearby spinach fields & feral hogs

November, 2006 • State Health Departments detected

elevated incidence of illness due to Salmonella Tennessee

• Three closely related PFGE types rarely reported before October, 2006

February, 2007• Case-control study conducted• Strong association with Peter Pan &

Great value peanut butter produced at the same plant

Outbreak PFGE types isolated from opened & unopened peanut butter produced from August, 2006 to January, 2007

> 600 people infected with outbreak PFGE types

PERSISTENCE OF L. MONOCYTOGENESIN FOOD PROCESSING PLANTS

• Listeria contamination patterns in six RTE small and very small RTE meat processing plants for two years 1,743 samples collected bi-monthly

• Year 1

• Non-food contact surfaces

• In-plant training for all employees

• General knowledge on Listeria ecology, transmission, and control

• Testing and molecular results from Year 1

• Year 2

• Non-food contact surfaces, food contact surfaces, finished product for some plants

DNA SEQUENCE-BASED STRAIN TYPING/IDENTIFICATION METHODS

• Allelic typing

• Multi-locus sequence typing (MLST)

• Multi-locus variable number tandem repeat analysis (MLVA)

• Single nucleotide polymorphism (SNP) typing

• Clustered regular spaced palindromic repeat (CRISPR ) typing

• Whole genome sequencing

• CDC, FDA and regulatory agencies in other countries routinely performing whole genome sequencing of foodborne pathogen isolates from human clinical cases of foodborne disease

HISTORY OF GENOME SEQUENCING

• Sequencing of DNA molecules began in the late 1970s• Chemical degradation, followed by the Sanger chain termination

method, known as the “gold standard” of DNA sequencing

• Shutgun sequencing, based on the Sanger method, cloning fragments into Escherichia coli for amplification

• First bacterial genome sequence completed (Haemophilus influenzae; 1.8 million bp) completed in 1995

• E. coli O157:H7/K-12 comparative genomes study; First Listeria monocytogenes genome published in 2001

• The first finished human genome sequence (3 billion bp) completed in 2003• Project took 13 years to complete and cost $2.7 billion

THE RACE FOR THE $1,000 GENOME

• 2003

• The J. Craig Venter Science Foundation promised $500,000 to the first group to produce a technology capable of sequencing a human genome for $1,000

• The X Prize Foundation promised an additional $5-20 million to the winner

• 2004

• The National Institutes of Health (NIH) launched $70 million program to support researchers working to sequence complete mammal-sized genomes initially for $100,000 and ultimately for $1,000

NEXT GENERATION SEQUENCING (NGS)

• Generate vast amounts of sequence data quickly and relatively inexpensively

• Unique chemistry and platforms template preparation , sequencing/imaging & data analyses)

• Eliminate need for “shot-gun” cloning & amplification

• Template preparation• Clonally amplified template originating from single molecule• Single DNA molecule template

• Sequencing/imaging• Cyclic reversible termination• Single nucleotide addition• Sequencing by ligation• Real-time Sequencing Metzker, 2010

APPLICATIONS OF NEXT-GENERATION SEQUENCING

• ChIP-Sequencing

• Methylation patterns

• Whole genome sequencing

• Development of detection kits, better subtyping tools, detection of outbreaks, identification of food source, microbial ecology & evolution

• Expression tags

• Metagenomics & microbial diversity

• Microbiome, culture independent diagnostics• Targeted resequencing

• Small RNA analysis

• Transcriptome sequencing

• Expression of genes under defined experimental conditions, niche adaption

Solexa/Illumnia

RAPID WHOLE GENOME SEQUENCING (WGS) BASED

SUBTYPING

3 days• DNA extraction • Library prep

24 h• Sequencing on Bench top sequencer (e.g., MiSeq, Ion Torrent)

12 h

• De novo assembly• Rapid classification to subpopulation using pairwise distances based

on average nucleotide identity values (BLAST)• Inference of subpopulation structure based on SNP calling.

APPLICATIONS OF WGS IN FOOD MICROBIOLOGY/FOOD SAFETY

• Facilitate development of improved detection assays

• Identification of genes/markers unique to pathogens or outbreak strains

• Allow for development improved molecular subtyping methods

• Pathogens that are difficult to differentiate by PFGE (e.g., certain Salmonella serotypes)

• Provide new insight into biology of food-associated microorganisms (pathogens, spoilage organisms, bene ficial microorganisms)

• Taxonomy (Five new Listeria spp., new Salmonella serotype), transcriptional profile and niche adaptation

• Allow for large scale population based studies of f ood associated microorganisms, environmental microorganisms and in testinal microbes

• Metagenomics to probe whole microbial community (i.e., culturableand non-culturable) sequencing total microbial DNA

WGS FOR OUTBREAK SPECIFIC DETECTION

• Background:

• Very large outbreak associated with non-O157:H7 str ain

• Unusual clinical characteristics• Serotype O104:H4

GENOME SEQUENCING

• Goals:

• Characterize strain to possibly gain insights into reason beyond unique epidemiological and clinical features of outbreak

• Use genome sequence data to develop PCR assays for detection• Performed, using the Ion Torrent system, both in Eu rope (Life

Technologies, Darmstadt Training Center in collabor ation with Münster University) and in China (BGI-Shenzen)

• Results:

• Strain lacks intimin and encodes for a number of genes that confer resistance to different antimicrobials

• Strain seems to be a “hybrid that has properties of EHEC and enteroaggregative E. coli”

Ion Torrent

• Instrument approx. $50,000• Reagents costs per isolate about $100• Initial sequencing of one isolate in < 1 week

FROM GENOME TO ASSAY

WGS FOR IMPROVED SUBTYPING

• Background

• Salmonella Montevideo is very clonal

• Large number of unrelated isolates have same PFGE type, e.g., isolates from “pistachio outbreak” and “salami outbreak”

• Similar issues with a number of other Salmonella serotypes (i.e., Newport and Enteritidis)

Xbal SpeI

Den Bakker et al. 2011. AEM.

WGS BIOLOGY OF FOOD-ASSOCIATED MICROBES

• Sanger method (Nelson et al., 2004)

• F6854 (food – 1988)• 133 contigs (2.97 MB combined length) – Pseudomolecule• 8X coverage

• Roche/454 Pyrosequencer (Orsi et al., 2008)

• F6900 (human - 1988)• 35 contigs (2.96 Mb combined length)• 26X coverage

• J0161 (human – 2000)• 49 contigs (2.97 MB combined length)• 2 contigs (82,678 bp combined length) extra-chromosomal plasmid

(not used for analyses purposes)• 29X coverage

• J2818 (food – 2000)• 38 contigs (2.97 Mb combined length)• 24X coverage

RESULTS• Full refined alignment: 2,922,773 bp (98.4 %) of F68 54

chromosome sequence (2,971,285 bp)

• 3 sub-alignments

• Backbone alignment• Prophage 1 alignment (inserted into comK)• Prophage 2 alignment (inserted into tRNA-Thr-4)

• SNPs (backbone and prophage 2):

• 44 SNPs• 42 singletons (observed in a single isolate)• 2 differentiate 1988 isolates from 2000 isolates

• Possible problems:

• Assembly• Alignment• Sequencing errors

DISTRIBUTION OF SNPS•Re-sequencing (Sanger method) confirmed 12 SNPs•8 SNPs specific to J2818 (2000 – food)

•3 in intergenic regions•1 synonymous•4 nonsynonymous•phosphoenolpyruvate synthase, putative •similar to ethanolamine utilization protein EutE•RDD family (transport?)•AddB

•2 SNPs unique to J0161 (2000 – human)•2 in intergenic regions

•1 not found in the J0161 isolate in the FSL collection•1 SNP unique to F6900 (1988 – human)

•Intergenic region•1 SNP differentiated 1988/2000 isolates

•1 nonsynonymous•Putative phage tail protein

RECOMBINATION

• Several recombination events involving a lineage I isolate, J1194, were identified by GENECONV

• All events fall within one of the prophages, inserted into comK

F6854F6900J2818J0161FSL J1-

194

Lysogenycontrol Cell

lysis Tail and base structures Head structural components &

assembly

DNA packaging

DNA replication, recombination, modification and gene expression modulation

Lysogenycontrol

Prophage inserted in comKR

1R2

R3

ER1

ER2

12 YEARS OF EVOLUTION

F6900 (human)

F6854(food)

Ancestor A

1

Ancestor B

comKprophagerecombination

tRNA prophage mutation or recombination

J2818(food)

J0161(human)

1

8

WHOLE GENOME SEQUENCING OF PAIRED PERSISTENCE STRAINS FRO

THE SAME SITE• 6 paired isolates corresponding to 3 different stra ins (ribotype or

sigB allelic types) isolated from three persistently col onized sites

• W1-215 19-Sep-07, B3-276 14-Nov-13, L. monocytogenes• W1-527 19-Sep-08, B1-832 16-Sep-11, L.monocytogenes• W1-179 8-Aug-07, B2-365 2-Jan-11, L. innocua

• 253 SNPs between W1-215 and B3-276, mostly in prophagerecombination region

• 17 SNPs between W1-527 and B1-832• 19 SNPs between W1-179 and B2-365

WGS AND POPULATION STUDIES

• Culture independent diagnostics

• Sequencing of all DNA found after enrichment is feasible and is being done by FDA

• Reduces bias that is inherent in traditional detection• Creates data on all DNA found; huge potential for

creating “incriminating” data in the absence of public health hazards

• WGS based characterization of microbial diversity found in a given sample

• Massively parallel 16S rDNA sequencing• Potential of source tracking

OVERALL SUMMARY AND CONCLUSIONS

• While next generation genome sequencing is making “real world” contributions to food safety

• Improved subtyping over PFGE• Identification of better target genes for detection• Translation of transcriptomics, metabolomics etc. findings to improved

prevention and treatment is in the early stages

• Still a bottleneck with assembly, annotation and analyses of WGS data

• Bioinformatics pipelines are rapidly catching up with hardware, making application by public health labs feasible

• WGS will also impact pathogen detection approaches

the rise of whole genome sequencing as a subtyping … · 2019-04-29 · steak expert meeting:...

Documents