culture-independent public health subtyping methods for ......100 100 100 100 100 57 57 62 100 50...
TRANSCRIPT
Heather Carleton, MPH Ph.DCDC, Enteric Diseases Laboratory Branch
Culture-Independent Public Health Subtyping Methods For Enteric Bacteria
Subtyping Methods: Isolate Dependency
Method Isolates Required?PFGE YesMLVA YesWGS Yes
Why are subtyping methods dependent on isolate – stool is complicated!
Science. 336:8 1246-1247
• Microbial genomeso Bacteria
o both commensal and pathogenic bacteria of the same genus species
o Viruseso Parasiteso Fungi
• Other genomeso Human o Food Animalso Plants
Challenging Specimen - Human Feces
Strategy to Address Loss of Isolates for Public Health Activities
Surveillance by current methods
(serotyping, AST, PFGE, MLVA etc.)
Surveillance by whole genome
sequencing (WGS)
Surveillance and diagnostics by metagenomics
1. Preserve cultures2. Prepare for the future working on pure cultures
3. MetagenomicsNo cultures
• Device industry consult• Regulatory / legal• Reimbursement• New isolate recovery
methods• Sentinel surveillance
• Sequence-based infrastructure
• Large genome databases• Amplicon sequencing
(short-term)• Shotgun
metagenomics (longer-term)
Public Health “Metagenomics”: From Specimen To Answer
Amplicon sequencing
“Shotgun” metagenomics
Single-cell sorting and sequencing
$
$$$$
MB
TB
Cost Data yield
Amplicon Sequencing
Identify pathogen-specific conserved PCR targets with sources of variation:o Flank heterogeneous regions that provide
enough variation for subtypingo Presence/absence profiling
Linkage by quantitation (wet or dry lab methods)
Whole genome sequence
Culture independent strain markers
Amplicon Sequencing: Aligning with WGS
Challenge of Finding Subtyping Amplicon Targets
Salmonella Commensals STEC Commensal E. coli
Salmonella should be plenty of targets that are not part of normal flora – how do you chose a subset
STEC – If phage is only differences between commensal and pathogenic E. coli have to identify subtyping targets encoded on phage
Lori Gladney
Identifying Heterogeneous Regions: Shiga toxin converting phages
Identifying regions that are unique between stx1 and stx2
Within phage find regions that have enough diversity for adequate subtyping
Salmonella Amplicon subtyping
Red strain Blue strain
Green strain
Blue strain
Red strain
Green strain
Orange strain
Complete Core Gene Tree
Red strainGreen strain
Orange strain
• Blue strain
Subset Core Gene TreeIdentify a subset of targets
to recreate phylogeny
Work done by Jo Williams
Salmonella extended MLST: workflow
[enrichment?]
Salmonella extended MLST: ID core genes
Prokkaannotations
152 genomes
Extract AA sequences for
all ORFs
Generate BLAST
database
BLAST 1 random genome’s ORFs
against DB; write all hits with Evalue≥ 10 to file with qcov and pident
values
Retain hits ≥ 50% similarity and ≥ 50% coverage
Group by query ID and remove
any without exactly 152 hits
1,863 core genes found
Input Python/Bio BLAST Muscle R EMBOS
S Primer3
Salmonella extended MLST: amplicon picking
Input Python/Bio BLAST Muscle R
Core gene alignments
(no end gaps)
Create consensus for
each preserving all ambiguities
Design 10 primer pairs /
consensus
Filter out significantly overlapping amplicons
& those without variation
Trim core gene alignments to correspond to
consensus amplicons
Choose a pseudorandom subset (eg. 40)
Create uncorrected pairwise distance matrix.# diffs / alignment length
Cluster by furthest neighbor (complete
linkage)Compare cluster assignments to all core gene classification
via Adjusted Wallace Coefficient
EMBOSS Primer3
100
100
100
100
100
100
100100
99100
100100
100
100
100
100
100 9082 96
10089
100
100 100
100
100
73100 83
100
100
100
100 100 100 100
100 100100
100
100
100 97 83
100 96
100
100 71 40100
100
100
100
100 89100 9968 6266 9895
10070
10058 5110099 92 99Sal_JR3_62_z4_z23_RSK2980
Sal_JR3_62_z36_RKS2983
Sal_JGG_CFSAN001992
Sal_JBX_CFSAN001080Sal_JM6_CVM19633
Sal_JIX_507440_20
Sal_PAR_RKS4594Sal_JD6_SC_B67
Sal_JPP_Ty21a
Sal_JKP_AKU_12601Sal_JKP_CMCC_50503
Sal_JPP_CT18Sal_JPP_Ty2Sal_JPP_P_stx_12
Sal_JKP_CMCC_50973
Sal_JKP_ATCC_9150
Sal_JEG_EC20110354
Sal_JEG_EC20110358
Sal_JDX_CT_02021853
Sal_JEG_EC20120008
Sal_JEG_EC20121176
Sal_JEG_OLF_SE9_10012
Sal_JEG
Sal_JRA_287_91
Sal_JRA_RKS5078
Sal_JEG_EC20120929Sal_JEG_P125109
Sal_JEG_EC20120005
Sal_JEG_EC20110353
Sal_JEG_SEJ
Sal_JEG_OLF_SE4_0317_8
Sal_JEG_EC20121179
Sal_JEG_EC20110356
Sal_JRB_S06004
Sal_JRA_CDC1983_67
Sal_JEG_EC20121180
Sal_JPX_ATCC_13311
Sal_JPX_08_1736
Sal_JPX_14028S
Sal_JPX_UK_1
Sal_JPX_L_3553
Sal_JPX_DT104
Sal_JPX_VNP20009
Sal_JPX_T000240
Sal_JPX_SL1344
Sal_JPX_var_5_CFSAN001921
Sal_JPX_U288
Sal_JPX_ST4_74
Sal_JPX_LT2
Sal_JPX_DT2
Sal_JPX_798Sal_JPX_D23580
Sal_JPX_138736
Sal_JJP_SL254
Sal_JAB_460004_2_1
Sal_JF6_CFSAN002069
Sal_JNX_TXSC_TXSC08_19Sal_JDG_CFSAN002050
Sal_JAG_ATCC_BAA_1592
Sal_JAB_24249
Sal_JKX_SPB7
Sal_JF6_B182
Sal_TDF_3114
Sal_JP6_RM6836
Sal_JJP_USMARC_S3124_1
Sal_JF6_SL476
Sal_JF6_CFSAN002064
Sal_JF6_41578
Sal_JAB_SL483
Sal_JAP_CFSAN000189
Sal_JRE_ATCC_35640
Sal_ABO_0014
Sal_JR5_Sbon_N268_08Sal_JR5_NCTC_12419
• All core gene tree for 74 Salmonella genomes (2,224 genes)
• Furthest neighbor (complete linkage) clustering• 33 clusters at uncorrected distance of 0.001 (1,689
SNPs)• Clusters of > 1 genome highlighted by line at
bottom• Bootstrap support from 100 replicates
100
100
100
100
100
57
5762
100
50100
35
100
6263
100
47
100
5357 73
29 336645
100
4611100
11
100
100
2340
28 100
100 10081 100
10074
100 2671 17 732444 48 6635 4378 497632 7343 61596539
100 7859 99
100 6539 63 63 63 Sal_ABO_0014
Sal_JAB_24249Sal_JAB_460004_2_1
Sal_JAB_SL483
Sal_JAG_ATCC_BAA_1592
Sal_JAP_CFSAN000189
Sal_JBX_CFSAN001080
Sal_JD6_SC_B67
Sal_JDG_CFSAN002050
Sal_JDX_CT_02021853
Sal_JEG
Sal_JEG_EC20110353
Sal_JEG_EC20110354Sal_JEG_EC20110356
Sal_JEG_EC20110358Sal_JEG_EC20120005
Sal_JEG_EC20120008
Sal_JEG_EC20120929
Sal_JEG_EC20121176
Sal_JEG_EC20121179
Sal_JEG_EC20121180
Sal_JEG_OLF_SE4_0317_8
Sal_JEG_OLF_SE9_10012
Sal_JEG_P125109
Sal_JEG_SEJ
Sal_JF6_41578
Sal_JF6_B182
Sal_JF6_CFSAN002064
Sal_JF6_CFSAN002069
Sal_JF6_SL476
Sal_JGG_CFSAN001992
Sal_JIX_507440_20
Sal_JJP_SL254Sal_JJP_USMARC_S3124_1
Sal_JKP_AKU_12601Sal_JKP_ATCC_9150
Sal_JKP_CMCC_50503Sal_JKP_CMCC_50973
Sal_JKX_SPB7
Sal_JM6_CVM19633
Sal_JNX_TXSC_TXSC08_19
Sal_JP6_RM6836
Sal_JPP_CT18
Sal_JPP_P_stx_12
Sal_JPP_Ty2
Sal_JPP_Ty21a
Sal_JPX_08_1736
Sal_JPX_138736
Sal_JPX_14028S
Sal_JPX_798
Sal_JPX_ATCC_13311
Sal_JPX_D23580
Sal_JPX_DT104
Sal_JPX_DT2
Sal_JPX_LT2
Sal_JPX_L_3553
Sal_JPX_SL1344Sal_JPX_ST4_74
Sal_JPX_T000240
Sal_JPX_U288
Sal_JPX_UK_1Sal_JPX_VNP20009
Sal_JPX_var_5_CFSAN001921
Sal_JR3_62_z36_RKS2983Sal_JR3_62_z4_z23_RSK2980
Sal_JR5_NCTC_12419Sal_JR5_Sbon_N268_08
Sal_JRA_287_91
Sal_JRA_CDC1983_67Sal_JRA_RKS5078
Sal_JRB_S06004
Sal_JRE_ATCC_35640
Sal_PAR_RKS4594
Sal_TDF_3114
• Subset of 40 core genes (full ORFs) with perfect bidirectional cluster agreement
• Alignment length 35,785 bp (end gaps stripped)• Furthest neighbor (complete linkage) clustering
• 33 clusters at uncorrected distance of 0.001 (36 SNPs)
• Clusters of > 1 genome highlighted by line at bottom• Bootstrap support from 100 replicates
• Note:• Relative relationships of clusters differ• Bootstrap values generally lower
STEC qPCR and extended MLST: subtyping using target outside of phage
Find subset of core genes to form extended
MLST
Use the same workflow from
SalmonellaAll MLST amplicons
Locus 1 Locus 2 Locus 3 Locus 4
A1
A2
A1
A2
A1
A2
A1
A2
Which alleles belong to STEC, A1 or A2?
STEC qPCR and extended MLST: role of qPCR
Pan E. colitarget
STEC targets
150 copies
50 copies
Pan E. coli – STEC = Commensal E. coli
150 – 50 = 100Commensal : STEC
100 : 502 : 1
All MLST amplicons
Locus 1 Locus 2 Locus 3 Locus 4
A1
A2
A1
A2
A1
A2
A1
A2
Public Health Metagenomics: From Specimen To Answer
Amplicon sequencing
“Shotgun” metagenomics
Single-cell sorting and sequencing
$
$$$$
MB
TB
Cost Data yield
In-situ Pathogen Characterization: Signal-to-Noise
Reference: http://www.illumina.com/systems/hiseq_2500_1500/performance_specifications.html* Lepage et al. Gut 2012.
8x109 reads (~1Tbase)
1x1011 organisms/mL
For a positive stool specimen containing 1x105 CFU/mL STEC, you might expect (at best) <0.15X genome coveragefrom a full HiSeq* run.
*For reference purposes only; does not imply endorsement.
“Clutter Mitigation” Strategies
NUCLEIC ACIDEXTRACTION
DNARNATNA
-Differential cell lysis-Filtering, Concentration-Separation/Pulldown -Direct amplification-Laser capture, microfluidics-Hi-C approach to capture physically linked genomes
LIBRARY
POST-EXTRACTIONAND LIBRARY
CONSTRUCTION
-Nucleases (RNAse/DNAse)-cDNA conversion-rRNA depletion-Bind/degrade CpG methylated DNA-Preferential separation (mass, seq, chem)-Genome/transcriptome amplification-Sequencing platform selection-Library method and parameters-Size selection
SEQUENCING
-New platforms/approaches-Multiplexing and pooling-Bioinformatic strategies
2013K-16392013K-1361
2012K-17472012K-1421
2012K-14202013K-1633
2013 State A Isolate 12013 State A Isolate 2
2013 State A Isolate 32013 State A metagenome Assembly
2013 State A metagenome assembly 22013 State Isolate 4
NC 0110832013K-1275
2013K-0574State B Isolate 1
State B metagenomic assembly 1
State B Isolate 2State B metagenomic assembly 2
State B metagenomic assembly 3State B Isolate 3State B Isolate 4
State B Isolate 5State B Isolate 6
State B Isolate 7State B Isolate 8State B Isolate 9
State B Isolate 10
7
25
0
12
1
1
0
18
84
100
100
5
6
2
27
100
97
97
59
100
77
100
44
0
3
5 SNPs per 100k bp
Developing new analysis pipelines to identify and subtype sequence reads associated with enteric pathogens
Metagenomic Pathogen Detection and Gut Microbiome Response to Foodborne Illness M. R. Weigand , A. Huang , A. V. Pena-Gonzalez , K. T. Konstantinidis , C. L. Tarr
Metagenomic Pathogen Detection Pipeline
SampleMean Identity
(%)
High identity (>95%) Read
Depth
Low identity (80-95%)
Read Depth100_Kickstart 99.75 7.887 0.097130FXP_Acidic 99.25 15.588 0.586135B_May30 98.97 14.576 0.784159FXP_Resplendent 95.21 16.998 10.046124_Usual 94.18 26.523 21.450136FXP_Florid 99.81 120.440 1.280169_Loopy 99.91 18.269 0.094198A_Dizzy2 99.68 48.174 0.916
10
0
0.1
100 Work by Andrew Huang
Public Health Metagenomics – Future Plans
Amplicon sequencing Pilot approaches at states within 2 years
“Shotgun” metagenomics Dependent on costs of approaches (clutter mitigation, Hi-C,
etc.) will be available in 3-5 years?
Single-cell sorting and sequencing Dependent on new technological innovations
$
$$$$
MB
TB
Acknowledgments
• Enteric Diseases Bioinformatics Group• A. Jo Williams• Lori Gladney• Andrew Huang• Lee Katz• Darlene Wagner• Taylor Griswold• Sung Im
For more information please contact Centers for Disease Control and Prevention1600 Clifton Road NE, Atlanta, GA 30333
Telephone: 1-800-CDC-INFO (232-4636)/TTY: 1-888-232-6348Visit: www.cdc.gov | Contact CDC at: 1-800-CDC-INFO or www.cdc.gov/info
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Greater EDLB• Rebecca Lindsey• Nancy Strockbine• Eija Trees• John Besser• Peter Gerner-Smidt• Efrain Ribot• Alex Mercante• Cheryl Tarr
Questions?