DNA Barcoding DNA Barcoding StatisticsStatistics
Rasmus NielsenRasmus Nielsen
University of CopenhagenUniversity of Copenhagen
Statistical ApproachesStatistical Approaches
Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.
Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how
desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.
Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.
Approach 1: Hypothesis Approach 1: Hypothesis testingtesting
Test HTest H00: : X Si In divergence model In divergence model X Si ~ ~ TT = 0 = 0 Likelihood ratio test Likelihood ratio test
based on based on
)(max
)(log2 0
TL
TL
T
T
a
T
Distribution of LRDistribution of LR
Statistical ApproachesStatistical Approaches
Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.
Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how
desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.
Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.
Approach 2: Classical Approach 2: Classical (decision theoretic) (decision theoretic)
assignment approachassignment approachBase assignment on Base assignment on Pr(X Si | D, X)
X: query sequenceSi : set of (mostly unobserved) sequences from species ID: all the avcailable DNA sequence data
ComputationComputation
Use MCMC under coalescence Use MCMC under coalescence model with divergence between model with divergence between species and other parameters.species and other parameters.
Calculate Calculate Pr(X Si | D, X) from MCMC output.
Currently only implemented for two species
Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator
Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator
Why not use assignment Why not use assignment based on marginal based on marginal
probabilities?probabilities?What if we usedWhat if we used
i.e. we can calculate posterior probabilities i.e. we can calculate posterior probabilities by assuming independence, i.e. ignoring by assuming independence, i.e. ignoring phylogeny.phylogeny.
jjjj
iiiii SXpSXX
SXpSXXXSX
)(),|Pr(
)(),|Pr(),|Pr(
D
DD
Assignment errorAssignment error
Approach 3: Coaleescence- Approach 3: Coaleescence- ShmoalescenceShmoalescence
Assign based on monophyly with Assign based on monophyly with other members of species other members of species (phylogenetic criterion).(phylogenetic criterion).
Do not estimate phylogeny but only Do not estimate phylogeny but only placement of query sequence placement of query sequence
of phylogeny.of phylogeny. Calculate posterior Calculate posterior
probability of assignment.probability of assignment.
AlgorithmsAlgorithms
BLAST to identify candidate set of BLAST to identify candidate set of species.species.
Possible iteration to ensure a Possible iteration to ensure a phylogenetic diverse sample.phylogenetic diverse sample.
Align and pipe to special version of Align and pipe to special version of MrBayes (by J. Huelsenbeck) which MrBayes (by J. Huelsenbeck) which maintains phylogenetic constraints.maintains phylogenetic constraints.
Caluclate assignment probability Caluclate assignment probability based on MrBayes output.based on MrBayes output.
Example taxonomy Example taxonomy summarysummary
fig2
Greenland Ice Cores Greenland Ice Cores ExampleExample
Greenland Ice Cores Greenland Ice Cores ExampleExample
Neanderthal ExampleNeanderthal Example
AcknowledgmentsAcknowledgments
Misha Matz (Coalescence based Misha Matz (Coalescence based methods).methods).
Wouter Boomsma and Kasper Munch Wouter Boomsma and Kasper Munch (Phylogenetic methods).(Phylogenetic methods).
John Huelsenbeck (MrBayes).John Huelsenbeck (MrBayes). Eske Willerslev (Ice and DNA Eske Willerslev (Ice and DNA
examples).examples). Jody Hey (discussion and inspiration).Jody Hey (discussion and inspiration).