the rise of whole genome sequencing as a subtyping … · 2019-04-29 · steak expert meeting:...
TRANSCRIPT
STEAK EXPERT MEETING: ANGERS FRANCE JUNE, 2015
THE RISE OF WHOLE GENOME SEQUENCING AS A SUBTYPING TOOL FOR MICROBIAL SOURCE
TRACKING: FROM FUNDAMENTALS TO APPLICATIONS
Kendra Nightingale, Ph.D.
International Center for Food Industry Excellence
Department of Animal & Food Sciences
Texas Tech University
SUBTYPING OF FOODBORNE PATHOGEN ISOLATES
• Ability to differentiate isolates belonging to food borne pathogen beyond the species sub-species level
• Considerations for interpretation of bacterial subtyping or “DNA fingerprint” typing data:
• Goal of human DNA subtyping = identify a single specific individual• Human DNA subtypes are unique• Bacterial subtypes are not
• Goal of bacterial DNA subtyping = determine if isolates share a recent common ancestor
BIOLOGY 101
DNA
mRNA
Protein/Enzymes
Toxins and Other Metabolites
MolecularMethods
PhenotypicMethods
DNA Replication
Translation
Transcription
MICROBIOLOGY 101
MICROBIOLOGY 101
MICROBIAL GENETICS 101
• What types of DNA molecules are present in a bacterial cell?
• What’s the size of the genetic material for a typical bacterial pathogen?
• How many genes does a bacterial pathogen have?
• What’s the average size of a bacterial gene?
MICROBIAL GENETICS 101
Generation 1
Generation 2
Generation 3
Generation N
Ancestor Genotype
Clones
Clones
Clones and Divergent Genotypes
Mutation ?Bacterial mutation rate ?
MICROBIAL GENETICS 101
• Mutations
• Point mutations • Frameshift mutations• Inversions• Insertions/deletions (indels), including duplications
• Horizontal gene transfer
• Horizontal gene transfer of homologous gene sequences
• Horizontal gene transfer of non-homologous gene sequences
• Plasmid loss or acquisition
SUBTYPING OF FOODBORNE PATHOGEN ISOLATES
• Subtyping tools to detect disease outbreaks and identify food source
• Phenotypic subtyping• Serotyping, phage typing, biotyping
• Molecular subtyping • Band-based
• PFGE, ribotyping, REP-PCR• DNA sequence-based
• Allele, multilocus sequence typing (MLST), multiple-locus variable number tandem repeat analysis (MLVA), genome
• Molecular subtyping methods more discriminatory & reproducible • Whole genome sequencing will facilitate disease outbreak
detection and outbreak investigations to identify food source
SUBTYPING & OUTBREAK INVESTIGATIONS
Exposure to Pathogen in Food
Human infection
Isolation from sterile body fluid using non-selective blood plates
Food Storage
Isolation using selective enrichment and
selective and differential agar media
Subtyping of Isolate Subtyping of Isolates
ESTABLISHMENT OF SUBTYPING NETWORK
• PulseNet established in the U.S. in 1996: turning po int for routine PFGE typing of bacterial foodborne disease surveill ance from clinical cases
• Initially focused on Escherichia coli O157:H7 but expanded to other pathogens (i.e., Campylobacter jejuni, Clostridium botulinum, Cronobacter, Listeria monocytogenes, Salmonella, and Shigella)
• Expanded to food and environmental isolates• Expanded internationally to “PulseNet International”
• Development and implementation of standardized prot ocols and rapid Web-based exchange of resultant pattern data
• Facilitated detection of temporally and spatially d istributed foodborne disease outbreaks & outbreak investigatio n
• PulseNet adapted for whole genome sequences and anal yses of resultant data
• Changing the shape of epidemic curves (i.e., reduci ng noise and expanding time frame of an outbreak)
Image provided by Peter Gerner-Smidt, Centers for Disease Control and Prevention
TEMPORALLY AND GEOGRAPHICALYDISPERSED OUTBREAK DETECTION
• Two more cases of the same illness where investigation shows illness came from same food or drink
• In 2014, the Centers for Disease Control & Prevention monitored between 20 and 40 potential related clusters of illness weekly and investigated >220 multistate clusters
• Investigations led to identification of 68 confirme d or suspected vehicles and recall of variety of foods
• Molecular subtyping and surveillance network to identify clusters of cases caused by the same strai n
• Mid-September, 2006 • CDC alerted about clusters of E.
coli O157:H7 illness in northwest• CDC PulseNet confirmed cases
caused by same PFGE type
• End of September, 2006 • 206 persons infected with outbreak
strain in 26 states• 52% hospitalized; 15% HUS; three
deaths• 95% reported eating fresh spinach
within 10 d before illness onset
• Trace-back investigation• E. coli O157:H7 matching outbreak
PFGE type isolated from cattle on ranch nearby spinach fields & feral hogs
November, 2006 • State Health Departments detected
elevated incidence of illness due to Salmonella Tennessee
• Three closely related PFGE types rarely reported before October, 2006
February, 2007• Case-control study conducted• Strong association with Peter Pan &
Great value peanut butter produced at the same plant
Outbreak PFGE types isolated from opened & unopened peanut butter produced from August, 2006 to January, 2007
> 600 people infected with outbreak PFGE types
PERSISTENCE OF L. MONOCYTOGENESIN FOOD PROCESSING PLANTS
• Listeria contamination patterns in six RTE small and very small RTE meat processing plants for two years 1,743 samples collected bi-monthly
• Year 1
• Non-food contact surfaces
• In-plant training for all employees
• General knowledge on Listeria ecology, transmission, and control
• Testing and molecular results from Year 1
• Year 2
• Non-food contact surfaces, food contact surfaces, finished product for some plants
DNA SEQUENCE-BASED STRAIN TYPING/IDENTIFICATION METHODS
• Allelic typing
• Multi-locus sequence typing (MLST)
• Multi-locus variable number tandem repeat analysis (MLVA)
• Single nucleotide polymorphism (SNP) typing
• Clustered regular spaced palindromic repeat (CRISPR ) typing
• Whole genome sequencing
• CDC, FDA and regulatory agencies in other countries routinely performing whole genome sequencing of foodborne pathogen isolates from human clinical cases of foodborne disease
HISTORY OF GENOME SEQUENCING
• Sequencing of DNA molecules began in the late 1970s• Chemical degradation, followed by the Sanger chain termination
method, known as the “gold standard” of DNA sequencing
• Shutgun sequencing, based on the Sanger method, cloning fragments into Escherichia coli for amplification
• First bacterial genome sequence completed (Haemophilus influenzae; 1.8 million bp) completed in 1995
• E. coli O157:H7/K-12 comparative genomes study; First Listeria monocytogenes genome published in 2001
• The first finished human genome sequence (3 billion bp) completed in 2003• Project took 13 years to complete and cost $2.7 billion
THE RACE FOR THE $1,000 GENOME
• 2003
• The J. Craig Venter Science Foundation promised $500,000 to the first group to produce a technology capable of sequencing a human genome for $1,000
• The X Prize Foundation promised an additional $5-20 million to the winner
• 2004
• The National Institutes of Health (NIH) launched $70 million program to support researchers working to sequence complete mammal-sized genomes initially for $100,000 and ultimately for $1,000
NEXT GENERATION SEQUENCING (NGS)
• Generate vast amounts of sequence data quickly and relatively inexpensively
• Unique chemistry and platforms template preparation , sequencing/imaging & data analyses)
• Eliminate need for “shot-gun” cloning & amplification
• Template preparation• Clonally amplified template originating from single molecule• Single DNA molecule template
• Sequencing/imaging• Cyclic reversible termination• Single nucleotide addition• Sequencing by ligation• Real-time Sequencing Metzker, 2010
APPLICATIONS OF NEXT-GENERATION SEQUENCING
• ChIP-Sequencing
• Methylation patterns
• Whole genome sequencing
• Development of detection kits, better subtyping tools, detection of outbreaks, identification of food source, microbial ecology & evolution
• Expression tags
• Metagenomics & microbial diversity
• Microbiome, culture independent diagnostics• Targeted resequencing
• Small RNA analysis
• Transcriptome sequencing
• Expression of genes under defined experimental conditions, niche adaption
Solexa/Illumnia
Solexa/Illumnia
RAPID WHOLE GENOME SEQUENCING (WGS) BASED
SUBTYPING
3 days• DNA extraction • Library prep
24 h• Sequencing on Bench top sequencer (e.g., MiSeq, Ion Torrent)
12 h
• De novo assembly• Rapid classification to subpopulation using pairwise distances based
on average nucleotide identity values (BLAST)• Inference of subpopulation structure based on SNP calling.
APPLICATIONS OF WGS IN FOOD MICROBIOLOGY/FOOD SAFETY
• Facilitate development of improved detection assays
• Identification of genes/markers unique to pathogens or outbreak strains
• Allow for development improved molecular subtyping methods
• Pathogens that are difficult to differentiate by PFGE (e.g., certain Salmonella serotypes)
• Provide new insight into biology of food-associated microorganisms (pathogens, spoilage organisms, bene ficial microorganisms)
• Taxonomy (Five new Listeria spp., new Salmonella serotype), transcriptional profile and niche adaptation
• Allow for large scale population based studies of f ood associated microorganisms, environmental microorganisms and in testinal microbes
• Metagenomics to probe whole microbial community (i.e., culturableand non-culturable) sequencing total microbial DNA
WGS FOR OUTBREAK SPECIFIC DETECTION
• Background:
• Very large outbreak associated with non-O157:H7 str ain
• Unusual clinical characteristics• Serotype O104:H4
GENOME SEQUENCING
• Goals:
• Characterize strain to possibly gain insights into reason beyond unique epidemiological and clinical features of outbreak
• Use genome sequence data to develop PCR assays for detection• Performed, using the Ion Torrent system, both in Eu rope (Life
Technologies, Darmstadt Training Center in collabor ation with Münster University) and in China (BGI-Shenzen)
• Results:
• Strain lacks intimin and encodes for a number of genes that confer resistance to different antimicrobials
• Strain seems to be a “hybrid that has properties of EHEC and enteroaggregative E. coli”
Ion Torrent
• Instrument approx. $50,000• Reagents costs per isolate about $100• Initial sequencing of one isolate in < 1 week
FROM GENOME TO ASSAY
WGS FOR IMPROVED SUBTYPING
• Background
• Salmonella Montevideo is very clonal
• Large number of unrelated isolates have same PFGE type, e.g., isolates from “pistachio outbreak” and “salami outbreak”
• Similar issues with a number of other Salmonella serotypes (i.e., Newport and Enteritidis)
Xbal SpeI
Den Bakker et al. 2011. AEM.
WGS BIOLOGY OF FOOD-ASSOCIATED MICROBES
• Sanger method (Nelson et al., 2004)
• F6854 (food – 1988)• 133 contigs (2.97 MB combined length) – Pseudomolecule• 8X coverage
• Roche/454 Pyrosequencer (Orsi et al., 2008)
• F6900 (human - 1988)• 35 contigs (2.96 Mb combined length)• 26X coverage
• J0161 (human – 2000)• 49 contigs (2.97 MB combined length)• 2 contigs (82,678 bp combined length) extra-chromosomal plasmid
(not used for analyses purposes)• 29X coverage
• J2818 (food – 2000)• 38 contigs (2.97 Mb combined length)• 24X coverage
RESULTS• Full refined alignment: 2,922,773 bp (98.4 %) of F68 54
chromosome sequence (2,971,285 bp)
• 3 sub-alignments
• Backbone alignment• Prophage 1 alignment (inserted into comK)• Prophage 2 alignment (inserted into tRNA-Thr-4)
• SNPs (backbone and prophage 2):
• 44 SNPs• 42 singletons (observed in a single isolate)• 2 differentiate 1988 isolates from 2000 isolates
• Possible problems:
• Assembly• Alignment• Sequencing errors
DISTRIBUTION OF SNPS•Re-sequencing (Sanger method) confirmed 12 SNPs•8 SNPs specific to J2818 (2000 – food)
•3 in intergenic regions•1 synonymous•4 nonsynonymous•phosphoenolpyruvate synthase, putative •similar to ethanolamine utilization protein EutE•RDD family (transport?)•AddB
•2 SNPs unique to J0161 (2000 – human)•2 in intergenic regions
•1 not found in the J0161 isolate in the FSL collection•1 SNP unique to F6900 (1988 – human)
•Intergenic region•1 SNP differentiated 1988/2000 isolates
•1 nonsynonymous•Putative phage tail protein
RECOMBINATION
• Several recombination events involving a lineage I isolate, J1194, were identified by GENECONV
• All events fall within one of the prophages, inserted into comK
F6854F6900J2818J0161FSL J1-
194
Lysogenycontrol Cell
lysis Tail and base structures Head structural components &
assembly
DNA packaging
DNA replication, recombination, modification and gene expression modulation
Lysogenycontrol
Prophage inserted in comKR
1R2
R3
ER1
ER2
12 YEARS OF EVOLUTION
F6900 (human)
F6854(food)
Ancestor A
1
Ancestor B
comKprophagerecombination
tRNA prophage mutation or recombination
J2818(food)
J0161(human)
1
8
WHOLE GENOME SEQUENCING OF PAIRED PERSISTENCE STRAINS FRO
THE SAME SITE• 6 paired isolates corresponding to 3 different stra ins (ribotype or
sigB allelic types) isolated from three persistently col onized sites
• W1-215 19-Sep-07, B3-276 14-Nov-13, L. monocytogenes• W1-527 19-Sep-08, B1-832 16-Sep-11, L.monocytogenes• W1-179 8-Aug-07, B2-365 2-Jan-11, L. innocua
• 253 SNPs between W1-215 and B3-276, mostly in prophagerecombination region
• 17 SNPs between W1-527 and B1-832• 19 SNPs between W1-179 and B2-365
WGS AND POPULATION STUDIES
• Culture independent diagnostics
• Sequencing of all DNA found after enrichment is feasible and is being done by FDA
• Reduces bias that is inherent in traditional detection• Creates data on all DNA found; huge potential for
creating “incriminating” data in the absence of public health hazards
• WGS based characterization of microbial diversity found in a given sample
• Massively parallel 16S rDNA sequencing• Potential of source tracking
OVERALL SUMMARY AND CONCLUSIONS
• While next generation genome sequencing is making “real world” contributions to food safety
• Improved subtyping over PFGE• Identification of better target genes for detection• Translation of transcriptomics, metabolomics etc. findings to improved
prevention and treatment is in the early stages
• Still a bottleneck with assembly, annotation and analyses of WGS data
• Bioinformatics pipelines are rapidly catching up with hardware, making application by public health labs feasible
• WGS will also impact pathogen detection approaches