new york state’s experience with analyzing, interpreting ......inform 11/18/15. we are engaged in...

22
New York State’s experience with analyzing, interpreting, and sharing whole genome sequence data for surveillance of enteric organisms. William Wolfgang, PhD Wadsworth Center, NYSDOH [email protected] InForm 11/18/15

Upload: others

Post on 03-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • New York State’s experience with analyzing, interpreting, and sharing whole genome sequence data for surveillance of enteric

    organisms.

    William Wolfgang, PhDWadsworth Center, [email protected]

    InForm 11/18/15

  • We are engaged in 3 Major projects

    May 2013FDA GenomeTrakr - building a genomic database

    • 1190 genomes - Mostly Salmonella from food and environment

    November 2014CDC-AMD - real-time surveillance

    • 333 genomes -L. monocytogenes, E. coli, and Salmonella

    October 2013Wadsworth Center real-time surveillance of Salmonella Enteritidis

    • 733 genomes - All Salmonella Enteritidis we receive from patients

  • Salmonella Enteritidis clusters are poorly resolved by PFGE

    • Our most commonly received strain of Salmonella 300-400 each year

    • .a. 1/2 have the same PFGE DNA

    fingerprint.

    b. And 2/3 have a very common endemic PFGE DNA fingerprint.

    c. These endemic patterns are of limited use to our epidemiologists.

    PFGE types detectedover a two year period

    43422158196925623327882318998791991841715330943919

    52 PFGE types

  • Two years ago we started sequencing of all SE in real-time

    Goals

    1. Evaluate WGS efficacy compared to PFGE.i. Is cluster resolution better?ii. Is it faster and cheaper?

    2. Develop analytical and interpretive tools.3. Develop communication pipeline to epidemiologists.4. Acquire a real data set to evaluate evolving informatic methods.

  • Analysis of WGS data usingPascal’s pipeline

  • We use a Reference based SNP analysis to compare isolates in a phylogenetic tree

    Pluses• Widely used method.• In house pipeline was feasible.• No large database is required.• Very high resolution.• Computationally fast.

    Minuses• Requires high quality reference genomes.• Lack of standards.

    Bacteria X

    Bacteria Y

    SNP

  • Every week Pascal’s Pipeline returns some useful data

    GC-69

  • Every week Pascal’s Pipeline returns some useful data

  • And every week we share these data with:• Ourselves through our LIMS system

    • NCBI - Sequence and metadatai. Upload two spreadsheets and the sequence files

    • Epidemiologists - we identifyi. New clusters of 2 isolates - 0 to 7 SNP differentii. Additions to existing clusters – no time limitiii. Indicate if the event occurred within the past 60 days

    Our report:Salmonella serotype Enteritidis GC-96: A new cluster of two isolates has been detected, containing: IDR1500052521 (JEGX01.0002) and IDR1500052977 (JEGX01.0002). These isolates have 1 single nucleotide differences (snps) out of 6918 snps and are considered closely related.

  • • 502 isolates sequenced in real time• Collected over 598 days (8/27/2013 to

    4/16/2015)

    • 32 PFGE typesi. 438/502 (87%) isolates were part of

    endemic patterns

    • Identified 86 Genomic Clustersi. About 60% of isolates reside in clustersii. Most clusters contain 4 or fewer

    isolates; mean 3.4iii. 5 non-endemic PFGE clusters identified

    Frequency Distribution of Isolatesin Genomic Clusters (GC)

    Isolates#

    of G

    enom

    ic C

    lust

    ers

    2 4 6 8 10 12 14 16 180

    10

    20

    30

    40

    50

    Using our SE data to inform the road ahead

  • PFGE JEGX01.0021 contains 12 genomic

    clusters

    PFGE JEGX01.0056 contains 4 genomic

    clusters

    PFGE JEGX01.0004 contains 45 genomic

    clusters

    PFGE JEGX01.0034

    contains 5 genomic clusters

    PFGE JEGX01.0002 contains 8 genomic

    clusters

    PFGE JEGX01.0005 contains 13

    genomic clusters

    WGS detects many clusters

  • PFGE types are not monphyletic andGenomic clusters can contain multiple PFGE

    types

    GC-20

    GC-41 GC-35

  • NYC Red Rooster restaurant outbreak

    • NYCDOHMH requested WGS.• 6 isolates were JEGX01.0034• 1 isolates was JEGX01.0004

    • Q: Was the outlier related?

    • A: Yes; There was 0 SNPs difference.

    20.0

    0

    40.0

    0

    100.

    00

    150.

    00

    200.

    00

    250.

    00

    300.

    00

    350.

    00

    400.

    00

    500.

    00

    600.

    00

    800.

    00

    1000

    1500

    2000

    .

    .EnteritidisEnteritidis

    JEGX01.0034JEGX01.0004

  • JEGX01.0034

    JEGX01.0021

    JEGX01.0023

    JEGX01.0005

    JEGX01.0030

    JEGX01.0004

    17% of Genomic clusters harbor two PFGE types

    • Commons pairs have all lost the same plasmid• SLA5

  • WGS is great!!!

    It works really well to confirm or refute epi. findings.• It provides the ultimate resolution.

    • Subdivide Endemic PFGE types.

    • It is more stable than PFGE.

    • And data can be readily shared.• In house through our LIMS• With partners through BaseSpace and FTP• And with NCBI through their submission portal• And NOW with CDC through BioNumerics 7.5!!!!

  • But challenges still exist.

    • Standardization and QA/QC

    • Making the data useful to our epidemiologists.i. Prioritization of clustersii. Better visualization

  • Prioritization and visualization of clusters in time

    Cluster = 0 to 4 SNPSIndicates appearance of 4

    isolates in a 60 day window

    Date sample collected

  • Visualization of clusters in time

    12 instead of 86 clusters of interest.

    Date

  • Summary

    WGS• Vastly improved resolution

    i. many more clusters are detectedii. clusters are more stable

    • It is great for supporting epidemiological investigationsi. particularly for excluding samples

    • It is not as useful at informing epidemiological investigationsi. we need to develop prioritization schemes

  • Next steps

  • Stop doing both PFGE and WGS!!!

    We may be ready soon for Listeria!!!• All isolates are being sequenced and analyzed.• Bionumerics 7.5 will let pilot labs analyze data locally.

    How could WGS labs share data with PFGE labs?• Infer PFGE types from genome data?

  • AcknowledgmentsWadsworth CenterBacteriology

    Kim MusserMichelle DickinsonSamantha WirthKara LevinsonKara Mitchell

    Sequencing CoreMatt ShudtZhen ZhangCharles MacGowanMelissa LeisnerDanielle Loranger

    Bioinformatics CoreMike PalumboPascal LaPierre

    PFGE LabDianna BoppDeb BakerLisa Thompson

    Minnesota DOHDavid BoxrudAngie TaylorVictoria Lappi

    BCDC NYSDOHMadhu AnandAndie Newman

    NCBIBill KlimkeMartin ShumwayYuriy Skripchenko

    FDAEric BrownPeter EvansMarc Allard Errol StrainRuth Timme

    CDCEija TreesHeather CarletonSteven Stroika

    New York State’s experience with analyzing, interpreting, and sharing whole genome sequence data for surveillance of enteric organisms.We are engaged in 3 Major projectsSalmonella Enteritidis clusters are poorly resolved by PFGETwo years ago we started sequencing of all SE in real-time�Analysis of WGS data using�Pascal’s pipelineWe use a Reference based SNP analysis to compare isolates in a phylogenetic treeEvery week Pascal’s Pipeline returns some useful dataEvery week Pascal’s Pipeline returns some useful dataAnd every week we share these data with:�WGS detects many clustersPFGE types are not monphyletic and�Genomic clusters can contain multiple PFGE typesNYC Red Rooster restaurant outbreak17% of Genomic clusters harbor two PFGE typesWGS is great!!!But challenges still exist.�Prioritization and visualization of clusters �in timeVisualization of clusters in time SummaryNext steps�Stop doing both PFGE and WGS!!!�Acknowledgments