culture-independent public health subtyping methods for ......100 100 100 100 100 57 57 62 100 50...

Heather Carleton, MPH Ph.DCDC, Enteric Diseases Laboratory Branch

Culture-Independent Public Health Subtyping Methods For Enteric Bacteria

Subtyping Methods: Isolate Dependency

Method Isolates Required?PFGE YesMLVA YesWGS Yes

Why are subtyping methods dependent on isolate – stool is complicated!

Science. 336:8 1246-1247

• Microbial genomeso Bacteria

o both commensal and pathogenic bacteria of the same genus species

o Viruseso Parasiteso Fungi

• Other genomeso Human o Food Animalso Plants

Challenging Specimen - Human Feces

Strategy to Address Loss of Isolates for Public Health Activities

Surveillance by current methods

(serotyping, AST, PFGE, MLVA etc.)

Surveillance by whole genome

sequencing (WGS)

Surveillance and diagnostics by metagenomics

1. Preserve cultures2. Prepare for the future working on pure cultures

3. MetagenomicsNo cultures

• Device industry consult• Regulatory / legal• Reimbursement• New isolate recovery

methods• Sentinel surveillance

• Sequence-based infrastructure

• Large genome databases• Amplicon sequencing

(short-term)• Shotgun

metagenomics (longer-term)

Public Health “Metagenomics”: From Specimen To Answer

Amplicon sequencing

“Shotgun” metagenomics

Single-cell sorting and sequencing

$

$$$$

MB

TB

Cost Data yield

Amplicon Sequencing

Identify pathogen-specific conserved PCR targets with sources of variation:o Flank heterogeneous regions that provide

enough variation for subtypingo Presence/absence profiling

Linkage by quantitation (wet or dry lab methods)

Presenter

Presentation Notes

What you get from a short-read metagenomic sequencing run isn’t as straightforward. There are a number of challenges, but I think they can be grouped into two general categories.

Whole genome sequence

Culture independent strain markers

Amplicon Sequencing: Aligning with WGS

Challenge of Finding Subtyping Amplicon Targets

Salmonella Commensals STEC Commensal E. coli

Salmonella should be plenty of targets that are not part of normal flora – how do you chose a subset

STEC – If phage is only differences between commensal and pathogenic E. coli have to identify subtyping targets encoded on phage

Lori Gladney

Identifying Heterogeneous Regions: Shiga toxin converting phages

Identifying regions that are unique between stx1 and stx2

Within phage find regions that have enough diversity for adequate subtyping

Salmonella Amplicon subtyping

Red strain Blue strain

Green strain

Blue strain

Red strain

Green strain

Orange strain

Complete Core Gene Tree

Red strainGreen strain

Orange strain

• Blue strain

Subset Core Gene TreeIdentify a subset of targets

to recreate phylogeny

Work done by Jo Williams

Salmonella extended MLST: workflow

[enrichment?]

Salmonella extended MLST: ID core genes

Prokkaannotations

152 genomes

Extract AA sequences for

all ORFs

Generate BLAST

database

BLAST 1 random genome’s ORFs

against DB; write all hits with Evalue≥ 10 to file with qcov and pident

values

Retain hits ≥ 50% similarity and ≥ 50% coverage

Group by query ID and remove

any without exactly 152 hits

1,863 core genes found

Input Python/Bio BLAST Muscle R EMBOS

S Primer3

Salmonella extended MLST: amplicon picking

Input Python/Bio BLAST Muscle R

Core gene alignments

(no end gaps)

Create consensus for

each preserving all ambiguities

Design 10 primer pairs /

consensus

Filter out significantly overlapping amplicons

& those without variation

Trim core gene alignments to correspond to

consensus amplicons

Choose a pseudorandom subset (eg. 40)

Create uncorrected pairwise distance matrix.# diffs / alignment length

Cluster by furthest neighbor (complete

linkage)Compare cluster assignments to all core gene classification

via Adjusted Wallace Coefficient

EMBOSS Primer3

100

100

100

100

100

100

100100

99100

100100

100

100

100

100

100 9082 96

10089

100

100 100

100

100

73100 83

100

100

100

100 100 100 100

100 100100

100

100

100 97 83

100 96

100

100 71 40100

100

100

100

100 89100 9968 6266 9895

10070

10058 5110099 92 99Sal_JR3_62_z4_z23_RSK2980

Sal_JR3_62_z36_RKS2983

Sal_JGG_CFSAN001992

Sal_JBX_CFSAN001080Sal_JM6_CVM19633

Sal_JIX_507440_20

Sal_PAR_RKS4594Sal_JD6_SC_B67

Sal_JPP_Ty21a

Sal_JKP_AKU_12601Sal_JKP_CMCC_50503

Sal_JPP_CT18Sal_JPP_Ty2Sal_JPP_P_stx_12

Sal_JKP_CMCC_50973

Sal_JKP_ATCC_9150

Sal_JEG_EC20110354

Sal_JEG_EC20110358

Sal_JDX_CT_02021853

Sal_JEG_EC20120008

Sal_JEG_EC20121176

Sal_JEG_OLF_SE9_10012

Sal_JEG

Sal_JRA_287_91

Sal_JRA_RKS5078

Sal_JEG_EC20120929Sal_JEG_P125109

Sal_JEG_EC20120005

Sal_JEG_EC20110353

Sal_JEG_SEJ

Sal_JEG_OLF_SE4_0317_8

Sal_JEG_EC20121179

Sal_JEG_EC20110356

Sal_JRB_S06004

Sal_JRA_CDC1983_67

Sal_JEG_EC20121180

Sal_JPX_ATCC_13311

Sal_JPX_08_1736

Sal_JPX_14028S

Sal_JPX_UK_1

Sal_JPX_L_3553

Sal_JPX_DT104

Sal_JPX_VNP20009

Sal_JPX_T000240

Sal_JPX_SL1344

Sal_JPX_var_5_CFSAN001921

Sal_JPX_U288

Sal_JPX_ST4_74

Sal_JPX_LT2

Sal_JPX_DT2

Sal_JPX_798Sal_JPX_D23580

Sal_JPX_138736

Sal_JJP_SL254

Sal_JAB_460004_2_1

Sal_JF6_CFSAN002069

Sal_JNX_TXSC_TXSC08_19Sal_JDG_CFSAN002050

Sal_JAG_ATCC_BAA_1592

Sal_JAB_24249

Sal_JKX_SPB7

Sal_JF6_B182

Sal_TDF_3114

Sal_JP6_RM6836

Sal_JJP_USMARC_S3124_1

Sal_JF6_SL476

Sal_JF6_CFSAN002064

Sal_JF6_41578

Sal_JAB_SL483

Sal_JAP_CFSAN000189

Sal_JRE_ATCC_35640

Sal_ABO_0014

Sal_JR5_Sbon_N268_08Sal_JR5_NCTC_12419

• All core gene tree for 74 Salmonella genomes (2,224 genes)

• Furthest neighbor (complete linkage) clustering• 33 clusters at uncorrected distance of 0.001 (1,689

SNPs)• Clusters of > 1 genome highlighted by line at

bottom• Bootstrap support from 100 replicates

100

100

100

100

100

57

5762

100

50100

35

100

6263

100

47

100

5357 73

29 336645

100

4611100

11

100

100

2340

28 100

100 10081 100

10074

100 2671 17 732444 48 6635 4378 497632 7343 61596539

100 7859 99

100 6539 63 63 63 Sal_ABO_0014

Sal_JAB_24249Sal_JAB_460004_2_1

Sal_JAB_SL483

Sal_JAG_ATCC_BAA_1592

Sal_JAP_CFSAN000189

Sal_JBX_CFSAN001080

Sal_JD6_SC_B67

Sal_JDG_CFSAN002050

Sal_JDX_CT_02021853

Sal_JEG

Sal_JEG_EC20110353

Sal_JEG_EC20110354Sal_JEG_EC20110356

Sal_JEG_EC20110358Sal_JEG_EC20120005

Sal_JEG_EC20120008

Sal_JEG_EC20120929

Sal_JEG_EC20121176

Sal_JEG_EC20121179

Sal_JEG_EC20121180

Sal_JEG_OLF_SE4_0317_8

Sal_JEG_OLF_SE9_10012

Sal_JEG_P125109

Sal_JEG_SEJ

Sal_JF6_41578

Sal_JF6_B182

Sal_JF6_CFSAN002064

Sal_JF6_CFSAN002069

Sal_JF6_SL476

Sal_JGG_CFSAN001992

Sal_JIX_507440_20

Sal_JJP_SL254Sal_JJP_USMARC_S3124_1

Sal_JKP_AKU_12601Sal_JKP_ATCC_9150

Sal_JKP_CMCC_50503Sal_JKP_CMCC_50973

Sal_JKX_SPB7

Sal_JM6_CVM19633

Sal_JNX_TXSC_TXSC08_19

Sal_JP6_RM6836

Sal_JPP_CT18

Sal_JPP_P_stx_12

Sal_JPP_Ty2

Sal_JPP_Ty21a

Sal_JPX_08_1736

Sal_JPX_138736

Sal_JPX_14028S

Sal_JPX_798

Sal_JPX_ATCC_13311

Sal_JPX_D23580

Sal_JPX_DT104

Sal_JPX_DT2

Sal_JPX_LT2

Sal_JPX_L_3553

Sal_JPX_SL1344Sal_JPX_ST4_74

Sal_JPX_T000240

Sal_JPX_U288

Sal_JPX_UK_1Sal_JPX_VNP20009

Sal_JPX_var_5_CFSAN001921

Sal_JR3_62_z36_RKS2983Sal_JR3_62_z4_z23_RSK2980

Sal_JR5_NCTC_12419Sal_JR5_Sbon_N268_08

Sal_JRA_287_91

Sal_JRA_CDC1983_67Sal_JRA_RKS5078

Sal_JRB_S06004

Sal_JRE_ATCC_35640

Sal_PAR_RKS4594

Sal_TDF_3114

• Subset of 40 core genes (full ORFs) with perfect bidirectional cluster agreement

• Alignment length 35,785 bp (end gaps stripped)• Furthest neighbor (complete linkage) clustering

• 33 clusters at uncorrected distance of 0.001 (36 SNPs)

• Clusters of > 1 genome highlighted by line at bottom• Bootstrap support from 100 replicates

• Note:• Relative relationships of clusters differ• Bootstrap values generally lower

STEC qPCR and extended MLST: subtyping using target outside of phage

Find subset of core genes to form extended

MLST

Use the same workflow from

SalmonellaAll MLST amplicons

Locus 1 Locus 2 Locus 3 Locus 4

A1

A2

A1

A2

A1

A2

A1

A2

Which alleles belong to STEC, A1 or A2?

STEC qPCR and extended MLST: role of qPCR

Pan E. colitarget

STEC targets

150 copies

50 copies

Pan E. coli – STEC = Commensal E. coli

150 – 50 = 100Commensal : STEC

100 : 502 : 1

All MLST amplicons

Locus 1 Locus 2 Locus 3 Locus 4

A1

A2

A1

A2

A1

A2

A1

A2

Public Health Metagenomics: From Specimen To Answer

Amplicon sequencing

“Shotgun” metagenomics

Single-cell sorting and sequencing

$

$$$$

MB

TB

Cost Data yield

In-situ Pathogen Characterization: Signal-to-Noise

Reference: http://www.illumina.com/systems/hiseq_2500_1500/performance_specifications.html* Lepage et al. Gut 2012.

8x109 reads (~1Tbase)

1x1011 organisms/mL

For a positive stool specimen containing 1x105 CFU/mL STEC, you might expect (at best) <0.15X genome coveragefrom a full HiSeq* run.

*For reference purposes only; does not imply endorsement.

http://www.illumina.com/systems/hiseq_2500_1500/performance_specifications.html

“Clutter Mitigation” Strategies

NUCLEIC ACIDEXTRACTION

DNARNATNA

-Differential cell lysis-Filtering, Concentration-Separation/Pulldown -Direct amplification-Laser capture, microfluidics-Hi-C approach to capture physically linked genomes

LIBRARY

POST-EXTRACTIONAND LIBRARY

CONSTRUCTION

-Nucleases (RNAse/DNAse)-cDNA conversion-rRNA depletion-Bind/degrade CpG methylated DNA-Preferential separation (mass, seq, chem)-Genome/transcriptome amplification-Sequencing platform selection-Library method and parameters-Size selection

SEQUENCING

-New platforms/approaches-Multiplexing and pooling-Bioinformatic strategies

2013K-16392013K-1361

2012K-17472012K-1421

2012K-14202013K-1633

2013 State A Isolate 12013 State A Isolate 2

2013 State A Isolate 32013 State A metagenome Assembly

2013 State A metagenome assembly 22013 State Isolate 4

NC 0110832013K-1275

2013K-0574State B Isolate 1

State B metagenomic assembly 1

State B Isolate 2State B metagenomic assembly 2

State B metagenomic assembly 3State B Isolate 3State B Isolate 4

State B Isolate 5State B Isolate 6

State B Isolate 7State B Isolate 8State B Isolate 9

State B Isolate 10

7

25

0

12

1

1

0

18

84

100

100

5

6

2

27

100

97

97

59

100

77

100

44

0

3

5 SNPs per 100k bp

Developing new analysis pipelines to identify and subtype sequence reads associated with enteric pathogens

Metagenomic Pathogen Detection and Gut Microbiome Response to Foodborne Illness M. R. Weigand , A. Huang , A. V. Pena-Gonzalez , K. T. Konstantinidis , C. L. Tarr

Metagenomic Pathogen Detection Pipeline

SampleMean Identity

(%)

High identity (>95%) Read

Depth

Low identity (80-95%)

Read Depth100_Kickstart 99.75 7.887 0.097130FXP_Acidic 99.25 15.588 0.586135B_May30 98.97 14.576 0.784159FXP_Resplendent 95.21 16.998 10.046124_Usual 94.18 26.523 21.450136FXP_Florid 99.81 120.440 1.280169_Loopy 99.91 18.269 0.094198A_Dizzy2 99.68 48.174 0.916

10

0

0.1

100 Work by Andrew Huang

Public Health Metagenomics – Future Plans

Amplicon sequencing Pilot approaches at states within 2 years

“Shotgun” metagenomics Dependent on costs of approaches (clutter mitigation, Hi-C,

etc.) will be available in 3-5 years?

Single-cell sorting and sequencing Dependent on new technological innovations

$

$$$$

MB

TB

Acknowledgments

• Enteric Diseases Bioinformatics Group• A. Jo Williams• Lori Gladney• Andrew Huang• Lee Katz• Darlene Wagner• Taylor Griswold• Sung Im

For more information please contact Centers for Disease Control and Prevention1600 Clifton Road NE, Atlanta, GA 30333

Telephone: 1-800-CDC-INFO (232-4636)/TTY: 1-888-232-6348Visit: www.cdc.gov | Contact CDC at: 1-800-CDC-INFO or www.cdc.gov/info

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Greater EDLB• Rebecca Lindsey• Nancy Strockbine• Eija Trees• John Besser• Peter Gerner-Smidt• Efrain Ribot• Alex Mercante• Cheryl Tarr

Questions?

culture-independent public health subtyping methods for ......100 100 100 100 100 57 57 62 100 50...

Documents