![Page 1: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/1.jpg)
Introduction to evolution and phylogeny
Nomenclature of trees
Four stages of molecular phylogeny:[1] selecting sequences[2] multiple sequence alignment[3] tree-building[4] tree evaluation
Practical approaches to making trees
Molecular PhylogeneticsMolecular Phylogenetics
![Page 2: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/2.jpg)
At the molecular level, evolution is a process ofmutation with selection.
Molecular evolution is the study of changes in genesand proteins throughout different branches of the tree of life.
Phylogeny is the inference of evolutionary relationships.Traditionally, phylogeny relied on the comparisonof morphological features between organisms. Today,molecular sequence data are also used for phylogeneticanalyses.
Introduction
![Page 3: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/3.jpg)
Millions of years since divergence
corr
ecte
d a
min
o a
cid
ch
ang
es
per
100
res
idu
es (m
)
Dickerson (1971)
![Page 4: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/4.jpg)
Fibrinopeptides 9.0Kappa casein 3.3Lactalbumin 2.7Serum albumin 1.9Lysozyme 0.98Trypsin 0.59Insulin 0.44Cytochrome c 0.22Histone H2B 0.09Ubiquitin 0.010Histone H4 0.010
Molecular clock for proteins:rate of substitutions per aa site per 109 years
![Page 5: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/5.jpg)
If protein sequences evolve at constant rates,they can be used to estimate the times that sequences diverged. This is analogous to datinggeological specimens by radioactive decay.
Molecular clock hypothesis: implications
![Page 6: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/6.jpg)
If protein sequences evolve at constant rates,they can be used to estimate the times that sequences diverged. This is analogous to datinggeological specimens by radioactive decay.
Molecular clock hypothesis: implications
N = total number of substitutionsL = number of nucleotide sites compared
between two sequences
K = = number of substitutionsper nucleotide site
NL
![Page 7: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/7.jpg)
Rate of nucleotide substitution r and time of divergence T
r = rate of substitution= 0.56 x 10-9 per site per year for hemoglobin alpha
K = 0.093 = number of substitutionsper nucleotide site (rat versus human)
r = K / 2TT = .093 / (2)(0.56 x 10-9) = 80 million years
![Page 8: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/8.jpg)
An often-held view of evolution is that just as organismspropagate through natural selection, so also DNA andprotein molecules are selected for.
According to Motoo Kimura’s 1968 neutral theoryof molecular evolution, the vast majority of DNAchanges are not selected for in a Darwinian sense.The main cause of evolutionary change is randomdrift of mutant alleles that are selectively neutral(or nearly neutral). Positive Darwinian selection doesoccur, but it has a limited role.
Neutral theory of evolution
![Page 9: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/9.jpg)
Phylogeny can answer questions such as:
Goals of molecular phylogeny
• How many genes are related to my favorite gene?• Was the extinct quagga more like a zebra or a horse?• Was Darwin correct that humans are closest to chimps and gorillas?• How related are whales, dolphins & porpoises to cows?• Where and when did HIV originate?• What is the history of life on earth?
![Page 10: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/10.jpg)
Woese PNAS
![Page 11: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/11.jpg)
There are two main kinds of information inherentto any tree: topology and branch lengths.
We will now describe the parts of a tree.
Molecular phylogeny: nomenclature of trees
![Page 12: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/12.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Molecular phylogeny uses trees to depict evolutionaryrelationships among organisms. These trees are basedupon DNA, RNA, and protein sequence data.
chronogramchronogram phylogramphylogram
![Page 13: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/13.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
taxon
taxon
![Page 14: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/14.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
taxon
operational taxonomic unit (OTU) such as a protein sequence
![Page 15: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/15.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
branch (edge)
Node (intersection or terminating pointof two or more branches)
![Page 16: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/16.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
2
1
2
D
Eone unit
Tree nomenclature
Branches are unscaled... Branches are scaled...
…branch lengths areproportional to number ofamino acid changes
…OTUs are neatly aligned,and nodes reflect time
![Page 17: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/17.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
6
1
2
2
1
A
BC
22
D
Eone unit
Tree nomenclature
bifurcatinginternal node
multifurcatinginternalnode
![Page 18: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/18.jpg)
Examples of multifurcation: failure to resolve the branching orderof some metazoans and protostomes
Rokas A. et al., Animal Evolution and the Molecular Signature of RadiationsCompressed in Time, Science 310:1933, 23 December 2005, Fig. 1.
![Page 19: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/19.jpg)
A
B
C
D
E
F
G
HI
time
6
2
1 1
2
1
2
Tree nomenclature: clades
Clade ABF (monophyletic group)
A group is monophyletic (Greek: "of one race") if it consists of a common ancestor and all its descendants.(http://en.wikipedia.org/wiki/)
![Page 20: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/20.jpg)
![Page 21: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/21.jpg)
The root of a phylogenetic tree represents thecommon ancestor of the sequences. Some treesare unrooted, and thus do not specify the commonancestor.
A tree can be rooted using an outgroup (that is, ataxon known to be distantly related from all otherOTUs).
Tree roots
![Page 22: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/22.jpg)
Tree nomenclature: roots
past
present
1
2 3 4
5
6
7 8
9
4
5
87
1
2
36
Rooted tree(specifies evolutionarypath)
Unrooted tree
![Page 23: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/23.jpg)
Tree nomenclature: outgroup rooting
past
present
1
2 3 4
5
6
7 8
9
Rooted tree
1
2 3 4
5 6
Outgroup(used to place the root)
7 9
10
root
8
![Page 24: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/24.jpg)
Cavalii-Sforza and Edwards (1967) derived the numberof possible unrooted trees (NU) for n OTUs (n > 3):
NU =
The number of bifurcating rooted trees (NR)
NR =
For 10 OTUs (e.g. 10 DNA or protein sequences),the number of possible rooted trees is 34 million,and the number of unrooted trees is 2 million.Many tree-making algorithms can exhaustively examine every possible tree for up to ten to twelvesequences.
Enumerating trees
(2n-5)!2n-3(n-3)!
(2n-3)!2n-2(n-2)!
![Page 25: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/25.jpg)
Molecular evolutionary studies can be complicatedby the fact that both species and genes evolve.speciation usually occurs when a species becomesreproductively isolated. In a species tree, eachinternal node represents a speciation event.
Genes (and proteins) may duplicate or otherwise evolvebefore or after any given speciation event. The topologyof a gene (or protein) based tree may differ from thetopology of a species tree.
Species trees versus gene/protein trees
![Page 26: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/26.jpg)
species 1 species 2
speciationevent
Species trees versus gene/protein trees
past
present
![Page 27: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/27.jpg)
species 1 species 2
speciationevent
Species trees versus gene/protein trees
Gene duplicationevents
![Page 28: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/28.jpg)
species 1 species 2
speciationevent
Species trees versus gene/protein trees
Gene duplicationevents
OTUs
![Page 29: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/29.jpg)
Orthology/paralogy
Orthologous genes are homologous (corresponding) genes in different species (genomes)
Paralogous genes are homologous genes within the same species (genome)
![Page 30: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/30.jpg)
Molecular phylogenetic analysis may be describedin four stages:
[1] Selection of sequences for analysis
[2] Multiple sequence alignment
[3] Tree building
[4] Tree evaluation
Four stages of phylogenetic analysis
![Page 31: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/31.jpg)
The fundamental basis of a phylogenetic tree isa multiple sequence alignment.
(If there is a misalignment, or if a nonhomologoussequence is included in the alignment, it will stillbe possible to generate a tree.)
Stage 2: Multiple sequence alignment
![Page 32: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/32.jpg)
Two Major Approaches to Phylogeny Two Major Approaches to Phylogeny InferenceInference
1)1) Distance Matrix MethodsDistance Matrix Methods
Calculate matrix of pairwise distances from all Calculate matrix of pairwise distances from all data, then infer tree using a clustering algorithm.data, then infer tree using a clustering algorithm.
2) Character Based Methods (maximum parsimony)2) Character Based Methods (maximum parsimony)
Inspect columns of characters, infer trees from Inspect columns of characters, infer trees from columns that contain “informative” characters, and columns that contain “informative” characters, and use these to infer most likely tree given the data. use these to infer most likely tree given the data.
![Page 33: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/33.jpg)
Reality: Not all sites are free to change, the same sites change Reality: Not all sites are free to change, the same sites change multiple timesmultiple times
Distance Matrix MethodsDistance Matrix Methods(matrix calculation)(matrix calculation)
![Page 34: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/34.jpg)
The simplest model is that of Jukes & Cantor
![Page 35: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/35.jpg)
Jukes & Cantor: dxy = -(3/4) ln (1-4/3 D)
• dxy = distance between sequence x and sequence y expressed as the number of changes per site
• (note dxy = r/n where r is number of replacements and n is the total number of sites. This assumes all sites can vary and when unvaried sites are present in two sequences it will underestimate the amount of change which has occurred at variable sites) (i.e., previous reality check)
• D = is the observed proportion of nucleotides which differ between two sequences (fractional dissimilarity)
• ln = natural log function to correct for superimposed substitutions (in general logging tends to convert exponential trends to linear trends)
• The 3/4 and 4/3 terms reflect that there are four types of nucleotides and three ways in which a second nucleotide may not match a first - with all types of change being equally likely (i.e. unrelated sequences should be 25% identical by chance alone)
![Page 36: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/36.jpg)
The natural logarithm ln is used to correct for superimposed changes at the same site
• If two sequences are 95% identical they are different at 5% or 0.05 (D) of sites thus:
– dxy = -3/4 ln (1-4/3 0.05) = 0.0517
• Note that the observed dissimilarity 0.05 increases only slightly to an estimated 0.0517 - this makes sense because in two very similar sequences one would expect very few changes to have been superimposed at the same site in the short time since the sequences diverged apart
• However, if two sequences are only 50% identical they are different at 50% or 0.50 (D) of sites thus:
– dxy = -3/4 ln (1-4/3 0.5) = 0.824
• For dissimilar sequences, which may diverged apart a long time ago, the use of ln infers that a much larger number of superimposed changes have occurred at the same site
![Page 37: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/37.jpg)
![Page 38: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/38.jpg)
![Page 39: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/39.jpg)
![Page 40: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/40.jpg)
![Page 41: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/41.jpg)
![Page 42: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/42.jpg)
UPGMA is unweighted pair group methodusing arithmetic mean
1 2
3
4
5
Distance Matrix MethodsDistance Matrix Methods(tree construction)(tree construction)
![Page 43: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/43.jpg)
Tree-building methods: UPGMA
Step 1: compute the pairwise distances of allthe proteins. Get ready to put the numbers 1-5at the bottom of your new tree.
1 2
3
4
5
![Page 44: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/44.jpg)
Tree-building methods: UPGMA
Step 2: Find the two proteins with the smallest pairwise distance. Cluster them.
1 2
3
4
5
1 2
6
![Page 45: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/45.jpg)
Tree-building methods: UPGMA
Step 3: Do it again. Find the next two proteins with the smallest pairwise distance. Cluster them.
1 2
3
4
5
1 2
6
4 5
7
![Page 46: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/46.jpg)
Tree-building methods: UPGMA
Step 4: Keep going. Cluster.
1 2
3
4
5 1 2
6
4 5
7
3
8
![Page 47: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/47.jpg)
Tree-building methods: UPGMA
Step 4: Last cluster! This is your tree.
1 2
3
4
5
1 2
6
4 5
7
3
8
9
![Page 48: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/48.jpg)
![Page 49: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/49.jpg)
![Page 50: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/50.jpg)
![Page 51: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/51.jpg)
UPGMA is a simple approach for making trees.
• An UPGMA tree is always rooted.• An assumption of the algorithm is that the molecular clock is constant for sequences in the tree. If there are unequal substitution rates, the tree may be wrong.• While UPGMA is simple, it is less accurate than the neighbor-joining approach (described next).
Distance-based methods: UPGMA trees
![Page 52: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/52.jpg)
• Fast - suitable for analysing data sets which are too large for other more computationally intensive methods such as maximum likelihood
• A large number of models are available with many parameters -improves estimation of distances
Distance method: Advantages
![Page 53: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/53.jpg)
• Information is lost - given only the distances, it is impossible to derive the original sequences
• Only through character based analyses can the history of sites be investigated; e.g., most informative positions be inferred
Distance method: Disadvantages
![Page 54: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/54.jpg)
Character Based Methods: Character Based Methods: Maximum ParsimonyMaximum Parsimony
The best tree: should be the one that The best tree: should be the one that requires the smallest number of requires the smallest number of substitutions to explain the substitutions to explain the differences among the sequences differences among the sequences being studied.being studied.
Occam's razor: Among his statements (translated from his Latin) are: "Plurality is not to be assumed without necessity" and "What can be done
with fewer [assumptions] is done in vain with more." One consequence of this methodology is the idea that the
simplest or most obvious explanation of several competing ones is the one that should be preferred until it is proven wrong.
![Page 55: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/55.jpg)
• informative sites - nucleotide (or amino acid) columns that are represented by at least two different character states found in at least two different sequences, these sites allow the distinction between alternative trees.
• uninformative sites - nucleotide (or amino acid) columns that do not allow the distinction between two trees (e.g., constant)
Not all Characters are Used in Parsimony Analysis
![Page 56: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/56.jpg)
Maximum Parsimony (4-taxon case)Maximum Parsimony (4-taxon case) 1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10
1 - A G G G T A A C T G1 - A G G G T A A C T G
2 - A C G A T T A T T A2 - A C G A T T A T T A
3 - A T A A T T G T C T3 - A T A A T T G T C T
4 - A A T G T T G T C G4 - A A T G T T G T C G
1
3
2
4
1
2
3
4
1
4
3
2
How may informative sites are there How may informative sites are there in this data set?in this data set?
![Page 57: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/57.jpg)
Maximum Parsimony (4-taxon case)Maximum Parsimony (4-taxon case) 1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10
1 - A 1 - A GG G G T A A C T G G G T A A C T G
2 - A 2 - A CC G A T T A T T A G A T T A T T A
3 - A 3 - A TT A A T T G T C T A A T T G T C T
4 - A 4 - A AA T G T T G T C G T G T T G T C G
0 30 3
0 30 3
0 30 3
1
3
2
4
1
2
3
4
1
4
3
2
![Page 58: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/58.jpg)
Maximum ParsimonyMaximum Parsimony
22
1 - 1 - GG
2 - 2 - CC
3 - 3 - TT
4 - 4 - AA
1
2
3
4AA
GG
CC
TT
CC
AA
GG
TT
CC1
3
2
4CC
CC
GG
AA
TT1
4
3
2CC
33
33
33
![Page 59: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/59.jpg)
Maximum ParsimonyMaximum Parsimony 1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10
1 - A G 1 - A G GG G T A A C T G G T A A C T G
2 - A C 2 - A C GG A T T A T T A A T T A T T A
3 - A T 3 - A T AA A T T G T C T A T T G T C T
4 - A A 4 - A A TT G T T G T C G G T T G T C G
0 3 20 3 2
0 3 20 3 2
0 3 20 3 2
1
3
2
4
1
2
3
4
1
4
3
2
![Page 60: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/60.jpg)
Maximum ParsimonyMaximum Parsimony
33
1 - 1 - GG
2 - 2 - GG
3 - 3 - AA
4 - 4 - TT
1
2
3
4TT
GG
GG
AA
GG
TT
GG
AA
GG1
3
2
4GG
GG
GG
TT
AA1
4
3
2GG
22
22
22
![Page 61: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/61.jpg)
Maximum ParsimonyMaximum Parsimony 1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10
1 - A G G 1 - A G G GG T A A C T G T A A C T G
2 - A C G 2 - A C G AA T T A T T A T T A T T A
3 - A T A 3 - A T A AA T T G T C T T T G T C T
4 - A A T 4 - A A T GG T T G T C G T T G T C G
0 3 2 20 3 2 2
0 3 2 10 3 2 1
0 3 2 20 3 2 21
3
2
4
1
2
3
4
1
4
3
2
![Page 62: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/62.jpg)
Maximum ParsimonyMaximum Parsimony
44
1 - 1 - GG
2 - 2 - AA
3 - 3 - AA
4 - 4 - GG
1
2
3
4GG
GG
AA
AA
AA
GG
GG
AA
AA1
3
2
4AA
GG
AA
AA
GG1
4
3
2AA
22
22
11
![Page 63: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/63.jpg)
Maximum ParsimonyMaximum Parsimony
0 3 2 2 0 1 1 1 1 3 0 3 2 2 0 1 1 1 1 3 1414
0 3 2 1 0 1 2 1 2 3 150 3 2 1 0 1 2 1 2 3 15
0 3 2 2 0 1 2 1 2 3 160 3 2 2 0 1 2 1 2 3 16
1
3
2
4
1
2
3
4
1
4
3
2
![Page 64: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/64.jpg)
Maximum ParsimonyMaximum Parsimony
1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10
1 - A G G G T A A C T G1 - A G G G T A A C T G
2 - A C G A T T A T T A2 - A C G A T T A T T A
3 - A T A A T T G T C T3 - A T A A T T G T C T
4 - A A T G T T G T C G4 - A A T G T T G T C G
0 3 2 2 0 1 1 1 1 3 0 3 2 2 0 1 1 1 1 3 1414
1
2
3
4
![Page 65: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/65.jpg)
Parsimony - advantages
• is a simple method - easily understood operation
• does not seem to depend on an explicit model of evolution
• gives both trees and associated hypotheses of character evolution
• should give reliable results if the data is well structured and homoplasy is either rare or widely (randomly) distributed on the tree
![Page 66: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/66.jpg)
Parsimony - disadvantages• May give misleading results if homoplasy is common or concentrated in
particular parts of the tree, e.g:- thermophilic convergence- base composition biases- long branch attraction
• Underestimates branch lengths (Why?)• Model of evolution is implicit - behaviour of method not well understood• Parsimony often justified on purely philosophical grounds - we must
prefer simplest hypotheses - particularly by morphologists• For most molecular systematists, this is uncompelling
![Page 67: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/67.jpg)
Parsimony can be inconsistent
• Felsenstein (1978) developed a simple model phylogeny including four taxa and a mixture of short and long branches
• Under this model parsimony will give the wrong tree
A B
C D
Model tree
p pq
q q
Rates or Branch lengths
p >> q
A
B
C
D
Parsimony tree
Wrong
• With more data the certainty that parsimony will give the wrong tree increases - so that parsimony is statistically inconsistent
• Advocates of parsimony initially responded by claiming that Felsenstein’s result showed only that his model was unrealistic
• It is now recognised that the long-branch attraction (in the “Felsenstein Zone”) is one of the most serious problems in phylogenetic inference
Long branches are attracted but the similarity is homoplastic
![Page 68: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/68.jpg)
Summary and recommendationsSummary and recommendations
• Remember that molecular phylogenetics yields gene treesRemember that molecular phylogenetics yields gene trees
• Accurate gene trees may not be accurate organismal trees Accurate gene trees may not be accurate organismal trees
• Gene duplications and paralogy, and lateral transfer can Gene duplications and paralogy, and lateral transfer can produce mismatches between gene and organismal produce mismatches between gene and organismal phylogeniesphylogenies
• Use congruence between separate gene trees to identify robust Use congruence between separate gene trees to identify robust organismal phylogenies or mismatches that require further organismal phylogenies or mismatches that require further informationinformation
![Page 69: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/69.jpg)
The most famous case of LBA misleading biologists…
The Universal SSU rRNA TreeThe Universal SSU rRNA TreeWheelis et al. 1992 PNAS 89: 2930Wheelis et al. 1992 PNAS 89: 2930
![Page 70: Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649e155503460f94b00144/html5/thumbnails/70.jpg)
ArchezoaArchezoa
The SSU Ribosomal RNA Tree for EukaryotesThe SSU Ribosomal RNA Tree for Eukaryotes
Mitochondria?Mitochondria?
Prokaryotic Prokaryotic outgroupoutgroup
Animals
Fungi
Ciliates + ApicomplexaStramenopiles
Euglenozoa
GiardiaTrichomonas
Plants / green algae
Red algae
Entamoebae
Choanozoa
Dictyostelium
Physarum
Microsporidia
Percolozoa