comparing typing methods : do's and don't's
TRANSCRIPT
João André Carriço, PhDMicrobiology Institute/Institute for Molecular Medicine
Faculty of Medicine, University of Lisbon
Portugal
How to compare typing techniques:
do’s and Don’t’s
http://im.fm.ul.pthttp://imm.fm.ul.pthttp://www.joaocarrico.info
WORKSHOP 24:NGS FOR MICROBIAL GENOMIC
SURVEILLANCE AND MORE - ONE TECHNOLOGY FITS ALL
CONFLICTS OF INTEREST
NOTHING TO DISCLOSE
MICROBIAL TYPING
“Crude classifications and False generalizations are the curse of organized life”
George Bernard Shaw (1856 – 1950)
Microbial Typing: discriminating strains within a species/subspecies
TYPING METHODS: TYPES / SUBTYPES
Street market, Florence, Italy
HOW TO COMPARE TYPING METHODS
Struelens, M.J. et al, 1996. Clinical microbiology and infection, 2(1), pp.2–11.
HOW TO COMPARE TYPING METHODS
Struelens, M.J. et al, 1996. Clinical microbiology and infection, 2(1), pp.2–11.
Performance Criteria:TypeabilityReproducibilityStabilityDiscriminatory powerEpidemiological concordanceTyping System concordance
Convenience Criteria
TYPING METHODS: TYPES / SUBTYPESPFGE :PFGE Type (cut-off 80% DICE/UPGMA)PFGE Subtype (cut-off 80% DICE/UPGMA)
PFT DMLST :Clonal Complex (goeBURST)Sequence Type
ST 239 : 2-3-1-1-4-4-3ST 8 : 3-3-1-1-4-4-3
Serotype :SerogroupSerotype
emm typing:emm typeemm subtypes
cgMLST/ wgMLST/ SNP / kmer :Any clustering done on a tree or graph
TRADITIONAL TYPING AND NGS
Chronicle of a Death Foretoldhttp://en.wikipedia.org/wiki/File:ChronicleOfADeathForetold.JPG
Whole Genome Sequencing in typing:
- Gene-by-gene: wgMLST, cgMLST
- SNP comparison approaches: comparison with reference strains
- k-mer distances
- Ability to recover most of the present sequence based typing information in a single experimental procedure
COMPARING TYPING METHODS
Weissman S J et al. Appl. Environ. Microbiol. 2012;78:1353-1360
Conc
aten
ated
MLS
T lo
cus
flmH sequences
The Hard way….
NEED FOR QUANTIFICATION AND STATISTICS
When you can measure what you are talking about and express it in numbers you know something about it. When you cannot measure it, when you cannot express it, your knowledge is of a meagre and unsatisfactory kind.
- Lord Kelvin 1861
POPULATION AND SAMPLE
9
7
6
6
POPULATION AND SAMPLE
9
7
6
6
3
2
2
3
Sampling introduces an error…. …. but this error can be quantified!
Confidence intervals allow for that quantification of sampling error and should be used instead of point estimates!
COMPARING PARTITIONS FRAMEWORK
Three Coefficients :
1)Simpson’s Index of Diversity
2)Adjusted Rand
3)Adjusted Wallace
And the respective 95% confidence intervals
COMPARINGPARTITIONS WEBSITE
http://www.comparingpartitions.info
COMPARINGPARTITIONS WEBSITE
Copy/Paste from Excel
MEASURING DIVERSITY: SIDSimpson’s Index of Diversity
This index indicates the probability of two strains sampled randomly from a population belonging to two different types
Since it is a probability varies between 0 – 1.
Highly discriminatory methods are desired…
..but are they always needed?
Confidence intervals were defined for SID and should be used.
NGS methods: Increased discrimination but what if every individual is a type ?? Simpson, 1948
Hunter and Gaston, 1988Grundmann et al ,2001
Comparing SID’s 95% CIs
Null Hypothesis: The values under comparison are the same
COMPARING METHODS RESULTS
PFG
E C
lust
ers1
s2
s3
s4
s5
s6
s7
Same Sequence Type?
Same PFGE cluster?
Y
N
Y N
aa b
c d
For each pair of isolates:
Seq
uenc
e Ty
pe
ADJUSTED RAND
Overall concordance of two methods taking into account that the agreement between results could arise by chance alone.
Bi-directional agreement measureConfidence intervals by jackknife pseudo-values method.
CHANCE AGREEMENT ILLUSTRATION
Two possible random rearrangements…
CHANCE AGREEMENT: RAND VS ADJUSTED RAND
Min: 0.629Max: 0.826
Test created by generating 6 to 11 random classifications for 237 cases (strains)
ADJUSTED WALLACE
Probability that if two strains share the same classification by a Method A they also share the same classification by Method B, corrected by chance agreement
Analytical confidence intervals.Jackknife pseudo values confidence intervals
ADJUSTED WALLACE
COMPARING AR AND AW 95% CI
Null Hypothesis: The values under comparison are the same
COMPARINGPARTITIONS WEBSITE
Scripts
Bionumerics™ Partition Mapping module (http://www.applied-maths.com/features/partition-mapping)
OTHER APPLICATIONS FOR SID,AR AND AW
• Determination of the best set of markers for typing purposes : given dozens to hundreds or thousands of possible loci or SNPs is there a subset with enough discrimination to produce the same results as other typing method?
http://www.cidmpublichealth.org/pages/ausetts.html.
OTHER APPLICATIONS FOR SID,AR AND AW
OTHER APPLICATIONS FOR SID,AR AND AW
• Determination of the best set of markers /typing methods for typing purposes for predicting a specific outcome or any associated metadata. Examples:• Using AW to determine the which typing method
better predicts a clinical outcome or prognosis.• Using AW to determine association between
alleles and Clonal Complexes (Weissman S J et al. Appl. Environ. Microbiol. 2012;78:1353-1360)
• Determining association between alleles or types and geographical location of sampling
Conclusions: Do’s and Don’t’sDO’s
• The larger the sample size the more accurate can be the conclusions
• Always use SID, Adjusted Rand and Adjusted Wallace
• Confidence intervals give more information than the point estimates because they intrinsically take the sample size into consideration
• Understand the algorithm before making conclusions about the results
• Assess the biological meaning of the results
Conclusions: Do’s and Don’t’s
DON’T’s
• Make comparisons using small number of isolates. Usually >50 is enough but >100 is better to get statistically significant results
• Don’t use coefficients that not corrected by chance agreement when comparing typing methods
TO KNOW MORE:
For examples of usage see the list of references in:http://darwin.phyloviz.net/ComparingPartitions/index.php?link=References
ACKNOWLEDGEMENTS
Mário RamirezFrancisco Pinto Ana Severiano
UMMI Members
Funding from Fundação para a Ciência e TecnologiaEU 7th Framework programme
Dag Harmsen, for the invitation to participate in the workshop
www.comparingpartitions.info