dna barcoding statistics rasmus nielsen university of copenhagen
TRANSCRIPT
![Page 1: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/1.jpg)
DNA Barcoding DNA Barcoding StatisticsStatistics
Rasmus NielsenRasmus Nielsen
University of CopenhagenUniversity of Copenhagen
![Page 2: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/2.jpg)
Statistical ApproachesStatistical Approaches
Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.
Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how
desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.
Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.
![Page 3: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/3.jpg)
Approach 1: Hypothesis Approach 1: Hypothesis testingtesting
Test HTest H00: : X Si In divergence model In divergence model X Si ~ ~ TT = 0 = 0 Likelihood ratio test Likelihood ratio test
based on based on
)(max
)(log2 0
TL
TL
T
T
a
T
![Page 4: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/4.jpg)
Distribution of LRDistribution of LR
![Page 5: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/5.jpg)
Statistical ApproachesStatistical Approaches
Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.
Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how
desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.
Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.
![Page 6: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/6.jpg)
Approach 2: Classical Approach 2: Classical (decision theoretic) (decision theoretic)
assignment approachassignment approachBase assignment on Base assignment on Pr(X Si | D, X)
X: query sequenceSi : set of (mostly unobserved) sequences from species ID: all the avcailable DNA sequence data
![Page 7: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/7.jpg)
ComputationComputation
Use MCMC under coalescence Use MCMC under coalescence model with divergence between model with divergence between species and other parameters.species and other parameters.
Calculate Calculate Pr(X Si | D, X) from MCMC output.
Currently only implemented for two species
![Page 8: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/8.jpg)
Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator
![Page 9: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/9.jpg)
Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator
![Page 10: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/10.jpg)
Why not use assignment Why not use assignment based on marginal based on marginal
probabilities?probabilities?What if we usedWhat if we used
i.e. we can calculate posterior probabilities i.e. we can calculate posterior probabilities by assuming independence, i.e. ignoring by assuming independence, i.e. ignoring phylogeny.phylogeny.
jjjj
iiiii SXpSXX
SXpSXXXSX
)(),|Pr(
)(),|Pr(),|Pr(
D
DD
![Page 11: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/11.jpg)
Assignment errorAssignment error
![Page 12: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/12.jpg)
Approach 3: Coaleescence- Approach 3: Coaleescence- ShmoalescenceShmoalescence
Assign based on monophyly with Assign based on monophyly with other members of species other members of species (phylogenetic criterion).(phylogenetic criterion).
Do not estimate phylogeny but only Do not estimate phylogeny but only placement of query sequence placement of query sequence
of phylogeny.of phylogeny. Calculate posterior Calculate posterior
probability of assignment.probability of assignment.
![Page 13: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/13.jpg)
AlgorithmsAlgorithms
BLAST to identify candidate set of BLAST to identify candidate set of species.species.
Possible iteration to ensure a Possible iteration to ensure a phylogenetic diverse sample.phylogenetic diverse sample.
Align and pipe to special version of Align and pipe to special version of MrBayes (by J. Huelsenbeck) which MrBayes (by J. Huelsenbeck) which maintains phylogenetic constraints.maintains phylogenetic constraints.
Caluclate assignment probability Caluclate assignment probability based on MrBayes output.based on MrBayes output.
![Page 14: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/14.jpg)
Example taxonomy Example taxonomy summarysummary
![Page 15: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/15.jpg)
fig2
![Page 16: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/16.jpg)
![Page 17: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/17.jpg)
![Page 18: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/18.jpg)
Greenland Ice Cores Greenland Ice Cores ExampleExample
![Page 19: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/19.jpg)
Greenland Ice Cores Greenland Ice Cores ExampleExample
![Page 20: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/20.jpg)
Neanderthal ExampleNeanderthal Example
![Page 21: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649efc5503460f94c0fda2/html5/thumbnails/21.jpg)
AcknowledgmentsAcknowledgments
Misha Matz (Coalescence based Misha Matz (Coalescence based methods).methods).
Wouter Boomsma and Kasper Munch Wouter Boomsma and Kasper Munch (Phylogenetic methods).(Phylogenetic methods).
John Huelsenbeck (MrBayes).John Huelsenbeck (MrBayes). Eske Willerslev (Ice and DNA Eske Willerslev (Ice and DNA
examples).examples). Jody Hey (discussion and inspiration).Jody Hey (discussion and inspiration).