![Page 1: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/1.jpg)
Distance-Based Distance-Based Genome Rearrangement PhylogenyGenome Rearrangement Phylogeny
Li-San Wang
University of Texas at Austinand
University of Pennsylvania
![Page 2: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/2.jpg)
2
Genomes As Signed Permutations
1 –5 3 4 -2 -6 or5 –1 6 2 -4 -3 etc.
1
5
3
4
2
6
![Page 3: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/3.jpg)
3
Genomes Evolve by Rearrangements
1 2 3 4 5 6 7 8 9 10
1 2 –6 –5 -4 -3 7 8 9 10
1 2 7 8 3 4 5 6 9 10
1 2 7 8 –6 -5 -4 -3 9 10
Inversion:
Transposition:
Inverted Transposition:
![Page 4: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/4.jpg)
4
Phylogeny Reconstruction
FN: false negative (missing edge)FP: false positive (incorrect edge)
50% error rate
S1 S2 S3 S4 S5
FN
S1
S2
S3S4
S5FP
S1 S2 S3
S4 S5
![Page 5: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/5.jpg)
5
Results
5% Error
![Page 6: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/6.jpg)
6
Outline
• Genome Rearrangement Evolution• True evolutionary distance estimators• Distance-based phylogeny reconstruction• Simulation study: accuracy of tree reconstruction
![Page 7: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/7.jpg)
7
Our Model: the Generalized Nadeau-Taylor Model [STOC’01]
• Three types of events: − Inversions (INV)− Transpositions (TRP)− Inverted Transpositions (ITP)
• Events of the same type are equiprobable• Probabilities of the three types have fixed ratio
• We focus on signed circular genomes in this talk.
![Page 8: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/8.jpg)
8
Additive Distance Matrix and True Evolutionary Distance (T.E.D.)
S2 S3 S4 S5
S1 0 9 15 14 17S2 0 14 13 16S3 0 13 16S4 0 13 13
75
4
5
8
S1
S2
S3
S4
S5
S1
S5 0
Theorem [Waterman et al. 1977] Given an m×m additive distance matrix, we can reconstruct a tree realizing the distance in O(m2) time.
![Page 9: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/9.jpg)
9
Error Tolerance of Neighbor Joining
Theorem [Atteson 1999]Let {Dij} be the true evolutionary distances, and {dij} be the estimated distances for T. Let be the length of the shortest edge in T. If for all taxa i,j, we have
then neighbor joining returns T.
2
1|| ijij dD
![Page 10: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/10.jpg)
10
Edit Distances Between Genomes
• (INV) Inversion distance [Hannenhalli & Pevzner 1995]− Computable in linear time [Moret et al 2001]
• (BP) Breakpoint distance [Watterson et al. 1982]− Computable in linear time− NJ(BP): [Blanchette, Kunisawa, Sankoff, 1999]
1 2 3 4 5 6 7 8 9 10
1 2 3 -8 -7 -6 4 5 9 10
A =
B =
BP(A,B)=3
![Page 11: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/11.jpg)
11
NJ(BP) and NJ(INV)
120 genes, 160 leavesUniformly Random Trees
Transpositions/inverted transpositions only
Inversion only
![Page 12: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/12.jpg)
12
BP and INV
INV vs K(120 genes)
(K: Actual number of inversions) (Inversion-only evolution)
BP/2 vs K
![Page 13: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/13.jpg)
13
Objectives
• Better techniques for estimating true evolutionary distances− Mathematical guarantees whenever possible− Good accuracy in simulation− Robust to model violations
• Main goal: improve topological accuracy of tree reconstructions using neighbor joining and weighbor
![Page 14: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/14.jpg)
14
New Estimators
• Exact-IEBP• Approx-IEBP• EDE
IEBP: Inverting Expected BreakPoint distanceEDE: Empirically Derived Estimator (inverts the expected inversion distance)
![Page 15: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/15.jpg)
15
Distance-Based Methods
WeighborINV
NJBP
EDE
IEBP(Exact-, Approx-)
![Page 16: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/16.jpg)
16
BP/2
K
K
BP
/2
(1)
(2)
Estimate True Evolutionary DistancesUsing BP
BP/2 vs K (120 genes)
(K: Actual number of inversions) (Inversion-only evolution)
To use the scatter plot to estimate the actual number of events (K):
1. Compute BP/2
2. From the curve, look up the corresponding valueof K
![Page 17: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/17.jpg)
17
Using Breakpoints to Estimate True Evolutionary Distances
• Compute fn(k)= E[BP(G0,Gk)]
(i.e. the expected number of breakpoints after k random events; n is the number of genes)
• Given two genomes G and G’:− Compute breakpoint distance d=BP(G,G’)
− Find k so that fn (k) is closest to d
• Challenge: finding fn (k)
![Page 18: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/18.jpg)
18
True Evolutionary Distance (t.e.d.) Estimators for Gene Order Data
T.E.D. Estimator
Exact-IEBP [WABI’01]
Approx-IEBP [STOC’01]
EDE [ISMB’01]
Based on the Expectation of
Breakpoint distance (Exact)
Breakpoint distance (Approx.)
Inversion distance (Approx.)
Derivation Analytical Analytical Empirical
Model knowledge
Required Required Inversion-only
Running Time O(n3) O(log n) O(1)
IEBP: Inverting the Expected BreakPoint distanceEDE: Empirically Derived Estimator
![Page 19: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/19.jpg)
19
Approx-IEBP [Wang & Warnow, STOC’01]
![Page 20: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/20.jpg)
20
True Evolutionary Distance Estimators
IEBP vs K(120 genes)
(K: Actual number of inversions) (Inversion-only evolution)
BP vs K
![Page 21: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/21.jpg)
21
True Evolutionary Distance Estimators
BP INV
IEBP
EDE
![Page 22: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/22.jpg)
22
Regression Formula for E(INV)
• Let n be the number of genes, and k be the number of inversions
• We use nonlinear regression to obtain easily computable formulas for E(INV) and Var(INV):
-> b=0.5956, c=0.4577
20
2
[ ( , )]~ ( ) min{ , } ( )kE INV G G x bx k
f x x xn x cx b n
0)0( f1. 1)0(' f2.
0 ( )f x x 3.
4.1( )f y : 0 1y y exists for all
![Page 23: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/23.jpg)
23
Absolute Difference Plot
• Motivation: Recall the criterion of Atteson’s Theorem. For all i,j:
2
1|| ijij dD
![Page 24: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/24.jpg)
24
Error Tolerance of Neighbor Joining
Theorem [Atteson 1999]Let {Dij} be the true evolutionary distances, and {dij} be the estimated distances for T. Let be the length of the shortest edge in T. If for all taxa i,j, we have
then neighbor joining returns T.
2
1|| ijij dD
![Page 25: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/25.jpg)
25
Absolute Difference Plot
• Motivation: Recall the criterion of Atteson’s Theorem. For all i,j:
• Absolute difference plot:
− Compute d=d(G,G’)− Plot |d-k| (y-axis) vs k (x-axis)
G G’
NT Model, k rearrangement events
2
1|| ijij dD
![Page 26: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/26.jpg)
26
120 Genes, Inversion-only Model
![Page 27: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/27.jpg)
27
120 Genes, Inv:Transp=1:1
![Page 28: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/28.jpg)
28
120 Genes, Transp Only
![Page 29: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/29.jpg)
29
Using True Evolutionary Distance Helps
120 genes160 taxaUniformly random treesTranspositions/invertedtranspositions only(180 runs per figure)
5%
![Page 30: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/30.jpg)
30
Variance of True Evolutionary Distance Estimators
• There are new distance-based phylogeny reconstruction methods (though designed for DNA sequences) − Weighbor [Bruno et al. 2000]
uses the variance of good t.e.d.s, and yield more accurate trees than NJ.
• Variance estimates for the t.e.d.s [Wang WABI’02]− Weighbor(IEBP),
Weighbor(EDE) K vs Exact-IEBP (120 genes)
![Page 31: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/31.jpg)
31
Using True Evolutionary Distance Helps
120 genes160 taxaUniformly random treesTranspositions/invertedtranspositions only(180 runs per figure)
5%
![Page 32: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/32.jpg)
32
Robustness
• EDE− Assumes inversion-only− When used with NJ or Weighbor, gives very
accurate results even under transposition-only evolution
• IEBP?
![Page 33: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/33.jpg)
33
IEBP is Robust to Model Violations
120 genes, 160 taxaUniformly Random Trees(alpha,beta)=(0,0) (inversion only)
NJ(Exact-IEBP)
![Page 34: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/34.jpg)
34
Beta Splitting Model [Aldous 1995]
-2 -1.5 -1 0 20
Comb UniformlyRandom
Aldous’suggestion
Yule CompletelyBalanced
![Page 35: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/35.jpg)
35
Effect of Beta on Accuracy(120 Genes, 160 Taxa)
(Beta= -1.5: Uniform, -1: Aldous, 0: Yule)
![Page 36: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/36.jpg)
36
Number of Genes
![Page 37: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/37.jpg)
37
Number of Taxa
![Page 38: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/38.jpg)
38
Observations
• Number of taxa has little effect on the accuracy of trees (at least under our way of model tree generation)
• Datasets with more genes give more accurate trees
• Topology has effect on the accuracy of trees for NJ(BP) and NJ(INV); using corrected distances reduces this effect.
![Page 39: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/39.jpg)
39
Summary
• True evolutionary distance estimators for gene order data
• When used with Neighbor Joining and Weighbor, produce highly accurate trees
• How model trees are generated has effect on the accuracy of tree reconstruction methods
![Page 40: Distance-Based Genome Rearrangement Phylogeny](https://reader034.vdocuments.us/reader034/viewer/2022051218/56815864550346895dc5c304/html5/thumbnails/40.jpg)
40
Acknowledgements
• University of Texas Tandy Warnow Robert K. Jansen Randy Linder Stacia Wyman
• University of New Mexico Bernard M.E. Moret David Bader Jijun Tang Mi Yan
• Central Washington University Linda Raubeson
• University of Canterbury Mike Steel
• NSF• Packard Foundation