robertson immemxi final march 2016
TRANSCRIPT
Utility of the Salmonella in Silico Typing Resource(SISTR) to outbreak investigations
James Robertson1, Catherine Yoshida1, Peter Kruczkiewicz2, Eduardo N. Taboada2 and John H. E. Nash3
1 National Microbiology Laboratory @Guelph , Public Health Agency of Canada2 National Microbiology Laboratory @Lethbridge, Public Health Agency of Canada3 National Microbiology Laboratory @Toronto, Public Health Agency of Canada
2
Salmonella is a leading public health concern Salmonella is a leading food-borne pathogen both in Canada and around the world
Globally, there are an estimated 94 million Salmonella infections every year Human costs:
• acute illness• loss of life (155,000 deaths)
Societal costs: • health care costs• lost productivity• legal costs• impact to food industry
3
Potential Sources
4
Challenges in Salmonella typing and epidemiology Small number of highly prevalent/globally distributed serovars account for most
outbreaks (e.g. Enteritidis, Typhimurium) Epidemiologicaly unrelated isolates within same serovar difficult to
investigate Additional subtyping resolution within a serovar needed (e.g. phage typing)
Increasing use of genotypic methods (i.e. molecular typing) Driven by need for methods with higher discriminatory power A number of different approaches have been applied to molecular typing of
Salmonella
5
GATCGATCGATCG
GATCAATCGATCG
MLST cgMLST wgSNP’sSerotyping
Discriminatory PowerLow Low-Mid Mid-High High
• Based on reaction of antibodies to surface antigens
• Broad usage and common nomenclature in use since the 1930’s
• Multi-Locus Sequence Typing: developed by Maiden et al. (1998)
• Indexes genetic variation in 7 core (i.e. “housekeeping”) genes
• cgMLST extends this principle to 100’s to 1000’s of loci
• Provides a portable naming scheme which correlates with historical serotypes
• Utilizes individual SNP’s and gives very high resolution
• Results are not portable to other public health professionals
7
• Initial dataset of 4330 genomes• 94.6% concordance between predicted
and reported serovar• in silico serovar predictions based on O
and H antigens• cgMLST refinement of serovar
assignment and analysis• Uses minimally processed genome
assemblies• Very fast ~30 seconds to process a
genome
What does SISTR do?In silico analysis of WGS data assembly statistics serovar prediction in silico typing (MLST,
cgMLST) AMR prediction
Comparative genomic analyses cgMLST accessory gene content core SNPs
Epidemiologic analysis geospatial distribution temporal distribution source association
https://lfz.corefacility.ca/sistr-app/
9
SISTR cgMLST
• Current cgMLST scheme in SISTR based on 330 core genes with high “assignability” (i.e. very low levels of “missing” data)
• Will include international Salmonella cgMLST scheme (i.e. once it is developed!)
• cgMLST information is used to:– Assess quality of WGS data complete, partial, missing
loci– Supplement genoserotyping predictions
10
Testing the accuracy of SISTR
• ~45,000 Salmonella genomes were downloaded from the SRA
• Raw reads were assembled using FLASH and Spades• Assemblies were loaded into SISTR and the serovar
predictions were compared between predicted and reported (where available)
• Assemblies were checked for contamination using Kraken• Quality was assessed using Quast
11
Recovery rates of 330 cgMLST genes from Assembled SRA genomes
41781
13931905
Number of Genomes with Complete 330
Number of Genomes with >300 Genes
Number of Genomes with <300 Genes
N=45,079
12
SISTR Accuracy2347
29884N=32,321
• 93.7% Overall concordance with serovar specified
Discordant
Concordant
13
• Two outbreaks of Salmonella Enteriditis were retrospectively sequenced• Examined the feasibility of WGS to outbreak investigations• Compared results of traditional molecular and microbial tests to WGS
14
15
16
17
18
SISTR (cgMLST) PARSNP (core SNP)
SNP Tree (Wuyts et al 2015)
• All three methods produce concordant trees.• cgMLST has a tendency to overgroup
Outbreak Clustering Categories
B
A
C
B
A+C
B
C
A
A
Correct Incorrectly Split
Over-grouped
A+B
A+C
Incorrectly Split and grouped
20
Concordance between cgMLST and SNP trees
Study Correct Over-grouped Split Combination Serovar(s)1 1 1 0 0 Enteriditis2 2 3 0 0 Enteriditis3 5 1 0 0 Enteriditis,Typhimurium,
Derby4 2 7 0 0 Enteriditis5 2 0 0 0 Enteriditis6 5 2 0 0 EnteriditisTotal 18 13 0 0
21
Conclusions• SISTR is a a robust and accurate platform for Salmonella in silico
typing with 93.7% concordance between specified serovar and predicted serovar
• The prototype 330 gene cgMLST scheme is readily retrievable from HTS assemblies of varying quality levels.
• The current scheme provides coarse grain separation of Salmonella genetic lineages that will be useful in outbreak analysis
22
Acknowledgements
Team: Ed Taboada, Peter Kruczkiewicz, Catherine Yoshida, John Nash
Research partners: Public Health Agency of Canada:
OIE Laboratory for Salmonellosis – National Microbiology Lab (NML) @ Guelph
Genomics Core and Bioinformatics Core – NML @ Winnipeg Public Health Genomics team – NML @ Winnipeg
IRIDA project team Animal Health Veterinary Laboratory Agency – UK Austrian Institute of Technology – Austria
Funding: Genomics Research and Development Initiative Genome Canada (IRIDA project) Public Health Agency of Canada