journal club @ uvigo 2011.07.22
DESCRIPTION
Discussion of article " Bayes Estimators for Phylogenetic Reconstruction", presented by Leo Martins to the Phylogenomics Lab of the University of Vigo Syst. Biol. 60(4), 528 540, 2011 doi 10.1093/sysbio/syr021TRANSCRIPT
Journal Club – Bayes Estimators for PhylogeneticReconstruction
Syst. Biol. 60(4), 528 – 540, 2011 doi 10.1093/sysbio/syr021
Leonardo de O. Martins
University of Vigo
July 22, 2011
Leo Martins (Univ. Vigo) Journal Club 22/7 1 / 12
Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 2 / 12
Statistical Risk
The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).
ρ(θ̂) =
∫L(θ, θ̂) P(θ | data) dθ
(promptly called posterior expected loss)
The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.
For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
Statistical Risk
The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).
ρ(θ̂) =
∫L(θ, θ̂) P(θ | data) dθ
(promptly called posterior expected loss)
The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.
For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
Statistical Risk
The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).
ρ(θ̂) =
∫L(θ, θ̂) P(θ | data) dθ
(promptly called posterior expected loss)
The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.
For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
Statistical Risk
The risk ρ associated with a decision θ̂ is the expected loss of this decisionθ̂ (which can be, for instance, an estimate of θ).
ρ(θ̂) =
∫L(θ, θ̂) P(θ | data) dθ
(promptly called posterior expected loss)
The loss function L(θ, θ̂) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.
For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data).
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 4 / 12
How to summarise a collection of objects?
scattered points
library(MASS);
x <- mvrnorm (n=1000 , mu=c(0,0), Sigma = matrix (c(1, 0.8, 0.9, 1), 2, 2, byrow=T));
plot (x[,1], x[,2], pch= ".", cex = 2, xlab="x", ylab="y");
Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
How to summarise a collection of objects?
centroid: minimizes a distance to all points
library(MASS);
x <- mvrnorm (n=1000 , mu=c(0,0), Sigma = matrix (c(1, 0.8, 0.9, 1), 2, 2, byrow=T));
plot (x[,1], x[,2], pch= ".", cex = 2, xlab="x", ylab="y");
Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
How to summarise a collection of objects?
regression line: minimizes a distance to all points
library(MASS);
x <- mvrnorm (n=1000 , mu=c(0,0), Sigma = matrix (c(1, 0.8, 0.9, 1), 2, 2, byrow=T));
plot (x[,1], x[,2], pch= ".", cex = 2, xlab="x", ylab="y");
Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 6 / 12
How to summarise the posterior distribution P(X)?
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
How to summarise the posterior distribution P(X)?
Posterior mean
Minimize the expected loss under a squared loss function
L(θ, θ̂) = (θ − θ̂)2
(Euclidean distance)
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
How to summarise the posterior distribution P(X)?
Posterior median
Minimize the expected loss under a linear loss function
L(θ, θ̂) =| θ − θ̂ |
(Manhattan distance)
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
How to summarise the posterior distribution P(X)?
Posterior mode
a.k.a. Maximum A Posteriori (MAP) estimate.Minimize the expected loss under a delta loss function
L(θ, θ̂) =
{0, for θ = θ̂
1, for θ 6= θ̂
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 8 / 12
Distances between trees
����BBBB
PPPP����
XXXXX
A
B
CD
E
����BBBB
PPPP����
XXXXX
A
B
ED
C
Trees from the article
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
Distances between trees
����BBBB
PPPP����
XXXXX
A
B
CD
E
����BBBB
PPPP����
XXXXX
A
B
ED
C
RF distance
DE|ABC and CD|ABEtotal 2 branches
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
Distances between trees
����BBBB
PPPP����
XXXXX
A
B
CD
E
����BBBB
PPPP����
XXXXX
A
B
ED
C
Quartet distance
AC|DE and AE|CDBC|DE and BE|CD4 quartets are different
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
Distances between trees
����BBBB
PPPP����
XXXXX
A
B
CD
E
����BBBB
PPPP����
XXXXX
A
B
ED
C
Quartet distance
AC|DE and AE|CDBC|DE and BE|CD4 quartets are different
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
Distances between trees
����BBBB
PPPP����
XXXXX
A
B
CD
E
����BBBB
PPPP����
XXXXX
A
B
ED
C
Path difference (number of speciations between trees)
path from A to E is one edge longer in one tree than the other
(...)
the overall difference is 6
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 10 / 12
If there is a distance, there is a Bayes estimator
For points in Rn, we know that the mean minimizes the Euclideandistance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance betweenthe samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
some of these are hard to solve as well
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
If there is a distance, there is a Bayes estimator
For points in Rn, we know that the mean minimizes the Euclideandistance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance betweenthe samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
some of these are hard to solve as well
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
If there is a distance, there is a Bayes estimator
For points in Rn, we know that the mean minimizes the Euclideandistance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance betweenthe samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
some of these are hard to solve as well
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
If there is a distance, there is a Bayes estimator
For points in Rn, we know that the mean minimizes the Euclideandistance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance betweenthe samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
some of these are hard to solve as well
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
If there is a distance, there is a Bayes estimator
For points in Rn, we know that the mean minimizes the Euclideandistance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance betweenthe samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
some of these are hard to solve as well
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
If there is a distance, there is a Bayes estimator
For points in Rn, we know that the mean minimizes the Euclideandistance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance betweenthe samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
some of these are hard to solve as well
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possibletopologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples
the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possibletopologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples
the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possibletopologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples
the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possibletopologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples
the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possibletopologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples
the distance used is the path difference (matrix subtraction)
don’t need to recalculate distance to all samples, just to matrix withaverage values
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possibletopologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distanceto all samples
the distance used is the path difference (matrix subtraction)don’t need to recalculate distance to all samples, just to matrix withaverage values
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12