robertson immemxi final march 2016

22
Utility of the Salmonella in Silico Typing Resource(SISTR) to outbreak investigations James Robertson 1 , Catherine Yoshida 1 , Peter Kruczkiewicz 2 , Eduardo N. Taboada 2 and John H. E. Nash 3 1 National Microbiology Laboratory @Guelph , Public Health Agency of Canada 2 National Microbiology Laboratory @Lethbridge, Public Health Agency of Canada 3 National Microbiology Laboratory @Toronto, Public Health Agency of Canada

Upload: iridacommunity

Post on 14-Apr-2017

44 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Robertson immemxi final March 2016

Utility of the Salmonella in Silico Typing Resource(SISTR) to outbreak investigations

James Robertson1, Catherine Yoshida1, Peter Kruczkiewicz2, Eduardo N. Taboada2 and John H. E. Nash3

1 National Microbiology Laboratory @Guelph , Public Health Agency of Canada2 National Microbiology Laboratory @Lethbridge, Public Health Agency of Canada3 National Microbiology Laboratory @Toronto, Public Health Agency of Canada

Page 2: Robertson immemxi final March 2016

2

Salmonella is a leading public health concern Salmonella is a leading food-borne pathogen both in Canada and around the world

Globally, there are an estimated 94 million Salmonella infections every year Human costs:

• acute illness• loss of life (155,000 deaths)

Societal costs: • health care costs• lost productivity• legal costs• impact to food industry

Page 3: Robertson immemxi final March 2016

3

Potential Sources

Page 4: Robertson immemxi final March 2016

4

Challenges in Salmonella typing and epidemiology Small number of highly prevalent/globally distributed serovars account for most

outbreaks (e.g. Enteritidis, Typhimurium) Epidemiologicaly unrelated isolates within same serovar difficult to

investigate Additional subtyping resolution within a serovar needed (e.g. phage typing)

Increasing use of genotypic methods (i.e. molecular typing) Driven by need for methods with higher discriminatory power A number of different approaches have been applied to molecular typing of

Salmonella

Page 5: Robertson immemxi final March 2016

5

GATCGATCGATCG

GATCAATCGATCG

MLST cgMLST wgSNP’sSerotyping

Discriminatory PowerLow Low-Mid Mid-High High

• Based on reaction of antibodies to surface antigens

• Broad usage and common nomenclature in use since the 1930’s

• Multi-Locus Sequence Typing: developed by Maiden et al. (1998)

• Indexes genetic variation in 7 core (i.e. “housekeeping”) genes

• cgMLST extends this principle to 100’s to 1000’s of loci

• Provides a portable naming scheme which correlates with historical serotypes

• Utilizes individual SNP’s and gives very high resolution

• Results are not portable to other public health professionals

Page 6: Robertson immemxi final March 2016
Page 7: Robertson immemxi final March 2016

7

• Initial dataset of 4330 genomes• 94.6% concordance between predicted

and reported serovar• in silico serovar predictions based on O

and H antigens• cgMLST refinement of serovar

assignment and analysis• Uses minimally processed genome

assemblies• Very fast ~30 seconds to process a

genome

Page 8: Robertson immemxi final March 2016

What does SISTR do?In silico analysis of WGS data assembly statistics serovar prediction in silico typing (MLST,

cgMLST) AMR prediction

Comparative genomic analyses cgMLST accessory gene content core SNPs

Epidemiologic analysis geospatial distribution temporal distribution source association

https://lfz.corefacility.ca/sistr-app/

Page 9: Robertson immemxi final March 2016

9

SISTR cgMLST

• Current cgMLST scheme in SISTR based on 330 core genes with high “assignability” (i.e. very low levels of “missing” data)

• Will include international Salmonella cgMLST scheme (i.e. once it is developed!)

• cgMLST information is used to:– Assess quality of WGS data complete, partial, missing

loci– Supplement genoserotyping predictions

Page 10: Robertson immemxi final March 2016

10

Testing the accuracy of SISTR

• ~45,000 Salmonella genomes were downloaded from the SRA

• Raw reads were assembled using FLASH and Spades• Assemblies were loaded into SISTR and the serovar

predictions were compared between predicted and reported (where available)

• Assemblies were checked for contamination using Kraken• Quality was assessed using Quast

Page 11: Robertson immemxi final March 2016

11

Recovery rates of 330 cgMLST genes from Assembled SRA genomes

41781

13931905

Number of Genomes with Complete 330

Number of Genomes with >300 Genes

Number of Genomes with <300 Genes

N=45,079

Page 12: Robertson immemxi final March 2016

12

SISTR Accuracy2347

29884N=32,321

• 93.7% Overall concordance with serovar specified

Discordant

Concordant

Page 13: Robertson immemxi final March 2016

13

• Two outbreaks of Salmonella Enteriditis were retrospectively sequenced• Examined the feasibility of WGS to outbreak investigations• Compared results of traditional molecular and microbial tests to WGS

Page 14: Robertson immemxi final March 2016

14

Page 15: Robertson immemxi final March 2016

15

Page 16: Robertson immemxi final March 2016

16

Page 17: Robertson immemxi final March 2016

17

Page 18: Robertson immemxi final March 2016

18

SISTR (cgMLST) PARSNP (core SNP)

SNP Tree (Wuyts et al 2015)

• All three methods produce concordant trees.• cgMLST has a tendency to overgroup

Page 19: Robertson immemxi final March 2016

Outbreak Clustering Categories

B

A

C

B

A+C

B

C

A

A

Correct Incorrectly Split

Over-grouped

A+B

A+C

Incorrectly Split and grouped

Page 20: Robertson immemxi final March 2016

20

Concordance between cgMLST and SNP trees

Study Correct Over-grouped Split Combination Serovar(s)1 1 1 0 0 Enteriditis2 2 3 0 0 Enteriditis3 5 1 0 0 Enteriditis,Typhimurium,

Derby4 2 7 0 0 Enteriditis5 2 0 0 0 Enteriditis6 5 2 0 0 EnteriditisTotal 18 13 0 0

Page 21: Robertson immemxi final March 2016

21

Conclusions• SISTR is a a robust and accurate platform for Salmonella in silico

typing with 93.7% concordance between specified serovar and predicted serovar

• The prototype 330 gene cgMLST scheme is readily retrievable from HTS assemblies of varying quality levels.

• The current scheme provides coarse grain separation of Salmonella genetic lineages that will be useful in outbreak analysis

Page 22: Robertson immemxi final March 2016

22

Acknowledgements

Team: Ed Taboada, Peter Kruczkiewicz, Catherine Yoshida, John Nash

Research partners: Public Health Agency of Canada:

OIE Laboratory for Salmonellosis – National Microbiology Lab (NML) @ Guelph

Genomics Core and Bioinformatics Core – NML @ Winnipeg Public Health Genomics team – NML @ Winnipeg

IRIDA project team Animal Health Veterinary Laboratory Agency – UK Austrian Institute of Technology – Austria

Funding: Genomics Research and Development Initiative Genome Canada (IRIDA project) Public Health Agency of Canada