new york state’s experience with analyzing, interpreting ......inform 11/18/15. we are engaged in...
TRANSCRIPT
-
New York State’s experience with analyzing, interpreting, and sharing whole genome sequence data for surveillance of enteric
organisms.
William Wolfgang, PhDWadsworth Center, [email protected]
InForm 11/18/15
-
We are engaged in 3 Major projects
May 2013FDA GenomeTrakr - building a genomic database
• 1190 genomes - Mostly Salmonella from food and environment
November 2014CDC-AMD - real-time surveillance
• 333 genomes -L. monocytogenes, E. coli, and Salmonella
October 2013Wadsworth Center real-time surveillance of Salmonella Enteritidis
• 733 genomes - All Salmonella Enteritidis we receive from patients
-
Salmonella Enteritidis clusters are poorly resolved by PFGE
• Our most commonly received strain of Salmonella 300-400 each year
• .a. 1/2 have the same PFGE DNA
fingerprint.
b. And 2/3 have a very common endemic PFGE DNA fingerprint.
c. These endemic patterns are of limited use to our epidemiologists.
PFGE types detectedover a two year period
43422158196925623327882318998791991841715330943919
52 PFGE types
-
Two years ago we started sequencing of all SE in real-time
Goals
1. Evaluate WGS efficacy compared to PFGE.i. Is cluster resolution better?ii. Is it faster and cheaper?
2. Develop analytical and interpretive tools.3. Develop communication pipeline to epidemiologists.4. Acquire a real data set to evaluate evolving informatic methods.
-
Analysis of WGS data usingPascal’s pipeline
-
We use a Reference based SNP analysis to compare isolates in a phylogenetic tree
Pluses• Widely used method.• In house pipeline was feasible.• No large database is required.• Very high resolution.• Computationally fast.
Minuses• Requires high quality reference genomes.• Lack of standards.
Bacteria X
Bacteria Y
SNP
-
Every week Pascal’s Pipeline returns some useful data
GC-69
-
Every week Pascal’s Pipeline returns some useful data
-
And every week we share these data with:• Ourselves through our LIMS system
• NCBI - Sequence and metadatai. Upload two spreadsheets and the sequence files
• Epidemiologists - we identifyi. New clusters of 2 isolates - 0 to 7 SNP differentii. Additions to existing clusters – no time limitiii. Indicate if the event occurred within the past 60 days
Our report:Salmonella serotype Enteritidis GC-96: A new cluster of two isolates has been detected, containing: IDR1500052521 (JEGX01.0002) and IDR1500052977 (JEGX01.0002). These isolates have 1 single nucleotide differences (snps) out of 6918 snps and are considered closely related.
-
• 502 isolates sequenced in real time• Collected over 598 days (8/27/2013 to
4/16/2015)
• 32 PFGE typesi. 438/502 (87%) isolates were part of
endemic patterns
• Identified 86 Genomic Clustersi. About 60% of isolates reside in clustersii. Most clusters contain 4 or fewer
isolates; mean 3.4iii. 5 non-endemic PFGE clusters identified
Frequency Distribution of Isolatesin Genomic Clusters (GC)
Isolates#
of G
enom
ic C
lust
ers
2 4 6 8 10 12 14 16 180
10
20
30
40
50
Using our SE data to inform the road ahead
-
PFGE JEGX01.0021 contains 12 genomic
clusters
PFGE JEGX01.0056 contains 4 genomic
clusters
PFGE JEGX01.0004 contains 45 genomic
clusters
PFGE JEGX01.0034
contains 5 genomic clusters
PFGE JEGX01.0002 contains 8 genomic
clusters
PFGE JEGX01.0005 contains 13
genomic clusters
WGS detects many clusters
-
PFGE types are not monphyletic andGenomic clusters can contain multiple PFGE
types
GC-20
GC-41 GC-35
-
NYC Red Rooster restaurant outbreak
• NYCDOHMH requested WGS.• 6 isolates were JEGX01.0034• 1 isolates was JEGX01.0004
• Q: Was the outlier related?
• A: Yes; There was 0 SNPs difference.
20.0
0
40.0
0
100.
00
150.
00
200.
00
250.
00
300.
00
350.
00
400.
00
500.
00
600.
00
800.
00
1000
1500
2000
.
.EnteritidisEnteritidis
JEGX01.0034JEGX01.0004
-
JEGX01.0034
JEGX01.0021
JEGX01.0023
JEGX01.0005
JEGX01.0030
JEGX01.0004
17% of Genomic clusters harbor two PFGE types
• Commons pairs have all lost the same plasmid• SLA5
-
WGS is great!!!
It works really well to confirm or refute epi. findings.• It provides the ultimate resolution.
• Subdivide Endemic PFGE types.
• It is more stable than PFGE.
• And data can be readily shared.• In house through our LIMS• With partners through BaseSpace and FTP• And with NCBI through their submission portal• And NOW with CDC through BioNumerics 7.5!!!!
-
But challenges still exist.
• Standardization and QA/QC
• Making the data useful to our epidemiologists.i. Prioritization of clustersii. Better visualization
-
Prioritization and visualization of clusters in time
Cluster = 0 to 4 SNPSIndicates appearance of 4
isolates in a 60 day window
Date sample collected
-
Visualization of clusters in time
12 instead of 86 clusters of interest.
Date
-
Summary
WGS• Vastly improved resolution
i. many more clusters are detectedii. clusters are more stable
• It is great for supporting epidemiological investigationsi. particularly for excluding samples
• It is not as useful at informing epidemiological investigationsi. we need to develop prioritization schemes
-
Next steps
-
Stop doing both PFGE and WGS!!!
We may be ready soon for Listeria!!!• All isolates are being sequenced and analyzed.• Bionumerics 7.5 will let pilot labs analyze data locally.
How could WGS labs share data with PFGE labs?• Infer PFGE types from genome data?
-
AcknowledgmentsWadsworth CenterBacteriology
Kim MusserMichelle DickinsonSamantha WirthKara LevinsonKara Mitchell
Sequencing CoreMatt ShudtZhen ZhangCharles MacGowanMelissa LeisnerDanielle Loranger
Bioinformatics CoreMike PalumboPascal LaPierre
PFGE LabDianna BoppDeb BakerLisa Thompson
Minnesota DOHDavid BoxrudAngie TaylorVictoria Lappi
BCDC NYSDOHMadhu AnandAndie Newman
NCBIBill KlimkeMartin ShumwayYuriy Skripchenko
FDAEric BrownPeter EvansMarc Allard Errol StrainRuth Timme
CDCEija TreesHeather CarletonSteven Stroika
New York State’s experience with analyzing, interpreting, and sharing whole genome sequence data for surveillance of enteric organisms.We are engaged in 3 Major projectsSalmonella Enteritidis clusters are poorly resolved by PFGETwo years ago we started sequencing of all SE in real-time�Analysis of WGS data using�Pascal’s pipelineWe use a Reference based SNP analysis to compare isolates in a phylogenetic treeEvery week Pascal’s Pipeline returns some useful dataEvery week Pascal’s Pipeline returns some useful dataAnd every week we share these data with:�WGS detects many clustersPFGE types are not monphyletic and�Genomic clusters can contain multiple PFGE typesNYC Red Rooster restaurant outbreak17% of Genomic clusters harbor two PFGE typesWGS is great!!!But challenges still exist.�Prioritization and visualization of clusters �in timeVisualization of clusters in time SummaryNext steps�Stop doing both PFGE and WGS!!!�Acknowledgments