bioinformatics t6-phylogenetics v2013-wim_vancriekinge
DESCRIPTION
PhylogeneticsTRANSCRIPT
![Page 1: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/1.jpg)
![Page 2: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/2.jpg)
FBW4-11-2013
Wim Van Criekinge
![Page 3: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/3.jpg)
![Page 4: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/4.jpg)
Phylogenetics
IntroductionDefinitionsSpecies conceptExamplesThe Tree-of-life
Phylogenetics MethodologiesAlgorithms
Distance MethodsMaximum LikelihoodMaximum Parsimony
RootingStatistical Validation
ConclusionsOrthologous genesHorizontal Gene TransferPhylogenomics
Practical Approach: PHYLIPWeblems
![Page 5: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/5.jpg)
Phylogeny (phylo =tribe + genesis)
Phylogenetic trees are about visualising evolutionary relationships. They reconstruct the pattern of events that have led to the distribution and diversity of life.
The purpose of a phylogenetic tree is to illustrate how a group of objects (usually genes or organisms) are related to one another
Nothing in Biology Makes Sense Except in the Light of Evolution. Theodosius Dobzhansky (1900-1975)
What is phylogenetics ?
![Page 6: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/6.jpg)
Trees
• Diagram consisting of branches and nodes • Species tree (how are my species related?)
– contains only one representative from each species.
– all nodes indicate speciation events
• Gene tree (how are my genes related?)– normally contains a number of genes from a
single species– nodes relate either to speciation or gene
duplication events
![Page 7: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/7.jpg)
Clade: A set of species which includes all of the species derived from a single common ancestor
![Page 8: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/8.jpg)
Species Concepts from Various AuthorsD.A. Baum and K.L. Shaw - Exclusive groups of organisms, where an exclusive group is one whose members are all more closely related to
each other than to any organisms outside the group.
J. Cracraft - An irreducible cluster of organisms, diagnosably distinct from other such clusters, and within which there is a parental pattern of ancestry and descent.
Charles Darwin - "From these remarks it will be seen that I look at the term species, as one arbitrarily given for the sake of convenience to a set of individuals closely resembling each other, and that it does not essentially differ from the term variety, which is given to less distinct and more fluctuating forms. The term variety, again, in comparison with mere individual differences, is also applied arbitrarily, and for mere convenience sake" (Origin of Species, 1st ed., p. 108).
T. Dobzhansky - The largest and most inclusive reproductive community of sexual and cross-fertilizing individuals which share a common gene pool. And later...Systems of populations, the gene exchange between which is limited or prevented by reproductive isolating mechanisms.
M. Ghiselin - The most extensive units in the natural economy, such that reproductive competition occurs among their parts.
D.M. Lambert - Groups of individuals that define themselves by a specific mate recognition system.
J. Mallet - Identifiable genotypic clusters recognized by a deficit of intermediates, both at single loci and at multiple loci.
E. Mayr - Groups of actually or potentially interbreeding natural populations which are reproductively isolated from other such groups.
C.D. Michener - A group of organisms not itself divisible by phenetic gaps resulting from concordant differences in character states (except for morphs - such as sex, age, or caste), but separated by such phenetic gaps from other such units.
H.E.H. Patterson - That most inclusive population of individual biparental organisms which share a common fertilization system.
G.G. Simpson - A lineage of populations evolving with time, separately from others, with its own unique evolutionary role and tendencies.
P.H.A. Sneath and R.R. Sokal - The smallest (most homogeneous) cluster that can be recognized upon some given criterion as being distinct from other clusters.
A.R. Templeton - The most inclusive population of individuals having the potential for phenotypic cohesion through intrinsic cohesion mechanisms (genetic and/or demographic - i.e. ecological -exchangeability).
E.O. Wiley - A single lineage of ancestor-descendant populations which maintains its identity from other such lineages and which has its own evolutionary tendencies and historical fate.
S. Wright - A species in time and space is composed of numerous local populations, each one intercommunicating and intergrading with others.
![Page 9: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/9.jpg)
Species
I. Definitions:
Species = the basic unit of classification
> Three different ways to recognize species:
![Page 10: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/10.jpg)
Definitions:
> Three different ways to recognize species:
1) Morphological species = the smallest group that is consistently and persistently distinct (Clusters in morphospace)
species are recognized initially on the basis of appearance; the individuals of one species look
different from the individuals of another
Plant Species
![Page 11: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/11.jpg)
Definitions:
> Three different ways to recognize species:
2) Biological species = a set of interbreeding or potentially interbreeding individuals that are separated from other species by reproductive barriers
species are unable to interbreed
Species
![Page 12: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/12.jpg)
Definitions:
> Three different ways to recognize species:
3) Phylogenetic species = the boundary between reticulate (among interbreeding individuals) and divergent relationships (between lineages with no gene exchange)
Species
![Page 13: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/13.jpg)
reticulate
divergentPhylogenetic species
recognized by the pattern of ancestor - descendent relationships
boundary
![Page 14: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/14.jpg)
Definitions:
> Three different ways to recognize species:
4) Phylogenomics species = ability to transmit (and maintain) a (stable) gene pool
Adresses the Anopheles genome topology variations
Species
![Page 15: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/15.jpg)
• In the tree to the left, A and B share the most recent common ancestry. Thus, of the species in the tree, A and B are the most closely related.
• The next most recent common ancestry is C with the group composed of A and B. Notice that the relationship of C is with the group containing A and B. In particular, C is not more closely related to B than to A. This can be emphasized by the following two trees, which are equivalent to each other:
Branching Order in a Phylogenetic Tree
![Page 16: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/16.jpg)
• A common simplifying assumption is that the three is bifurcating, meaning that each brach node has exactly two descendents.
• The edges, taken together, are sometimes said to define the topology of the tree
More definitions …
Branch node, internal node
Edge, Branch
LeafsTipsexternal node
![Page 17: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/17.jpg)
Outgroups, rooted versus unrooted
An unrooted reptilian phylogeny with an avian outgroup and the corresponding rooted phylogeny. The Ri represent modern reptiles; the Ai, inferred ancestors and the B a bird.
![Page 18: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/18.jpg)
Some definitions …
![Page 19: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/19.jpg)
Phylogenetic methods may be used to solve crimes, test purity of products, and determine whether endangered species have been smuggled or mislabeled: – Vogel, G. 1998.
HIV strain analysis debuts in murder trial. Science 282(5390): 851-853.
– Lau, D. T.-W., et al. 2001. Authentication of medicinal Dendrobium species by the internal transcribed spacer of ribosomal DNA. Planta Med 67:456-460.
Examples
![Page 20: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/20.jpg)
![Page 21: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/21.jpg)
– Epidemiologists use phylogenetic methods to understand the development of pandemics, patterns of disease transmission, and development of antimicrobial resistance or pathogenicity: • Basler, C.F., et al. 2001.
Sequence of the 1918 pandemic influenza virus nonstructural gene (NS) segment and characterization of recombinant viruses bearing the 1918 NS genes. PNAS, 98(5):2746-2751.
• Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science 256(5060):1165-1171.
• Bacillus Antracis:
Examples
![Page 23: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/23.jpg)
• Conservation biologists may use these techniques to determine which populations are in greatest need of protection, and other questions of population structure: – Trepanier, T.L., and R.W. Murphy. 2001.
The Coachella Valley fringe-toed lizard (Uma inornata): genetic diversity and phylogenetic relationships of an endangered species. Mol Phylogenet Evol 18(3):327-334.
– Alves, M.J., et al. 2001. Mitochondrial DNA variation in the highly endangered cyprinid fish Anaecypris hispanica: importance for conservation. Heredity 87(Pt 4):463-473.
• Pharmaceutical researchers may use phylogenetic methods to determine which species are most closely related to other medicinal species, thus perhaps sharing their medicinal qualities: – Komatsu, K., et al. 2001.
Phylogenetic analysis based on 18S rRNA gene and matK gene sequences of Panax vietnamensis and five related species. Planta Med 67:461-465.
Examples
![Page 24: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/24.jpg)
Tree-of-life
![Page 25: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/25.jpg)
Origin of the Universe 15 billion yrs
Formation of the Solar System 4.6 "
First Self-replicating System 3.5 "
Prokaryotic-Eukaryotic Divergence 2.0 "
Plant-Animal Divergence 1.0 "
Invertebrate-Vertebrate Divergence 0.5 "
Mammalian Radiation Beginning 0.1 "
Some Important Dates in History
![Page 26: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/26.jpg)
Tree Of Life
![Page 28: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/28.jpg)
Tree Of Life
![Page 29: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/29.jpg)
Tree Of Life
![Page 30: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/30.jpg)
What Sequence to Use ?
• To infer relationships that span the diversity of known life, it is necessary to look at genes conserved through the billions of years of evolutionary divergence.
• The gene must display an appropriate level of sequence conservation for the divergences of interest.
.
![Page 31: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/31.jpg)
• If there is too much change, then the sequences become randomized, and there is a limit to the depth of the divergences that can be accurately inferred.
• If there is too little change (if the gene is too conserved), then there may be little or no change between the evolutionary branchings of interest, and it will not be possible to infer close (genus or species
level) relationships.
What Sequence to Use ?
![Page 32: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/32.jpg)
Carl Woese
recognized the full potential of rRNA sequences as a measure of phylogenetic relatedness. He initially used an RNA sequencing method that determined about 1/4 of the nucleotides in the 16S rRNA (the best technology available at the time). This amount of data greatly exceeded anything else then available. Using newer methods, it is now routine to determine the sequence of the entire 16S rRNA molecule. Today, the accumulated 16S rRNA sequences (about 10,000) constitute the largest body of data available for inferring relationships among organisms.
Ribosomal RNA Genes and Their Sequences
![Page 33: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/33.jpg)
An example of genes in this category are those that define the ribosomal RNAs (rRNAs). Most prokaryotes have three rRNAs, called the 5S, 16S and 23S rRNA.
What Sequence to Use ?
Namea Size (nucleotides) Location
5S 120 Large subunit of ribosome
16S 1500 Small subunit of ribosome
23S 2900 Large subunit of ribosomea The name is based on the rate that the
molecule sediments (sinks) in water.
Bigger molecules sediment faster than small ones.
![Page 34: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/34.jpg)
The extraordinary conservation of rRNA genes can be seen in these fragments of the small subunit rRNA gene sequences from organisms spanning the known diversity of life:
human ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGCTGCAGTTAAAAAG...
yeast ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAG...
Corn ...GTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAG...
Escherichia coli ...GTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCG...
Anacystis nidulans ...GTGCCAGCAGCCGCGGTAATACGGGAGAGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCG...
Thermotoga maratima ...GTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTACCCGGATTTACTGGGCGTAAAGGG...
Methanococcus vannielii ...GTGCCAGCAGCCGCGGTAATACCGACGGCCCGAGTGGTAGCCACTCTTATTGGGCCTAAAGCG...
Thermococcus celer ...GTGGCAGCCGCCGCGGTAATACCGGCGGCCCGAGTGGTGGCCGCTATTATTGGGCCTAAAGCG...
Sulfolobus sulfotaricus ...GTGTCAGCCGCCGCGGTAATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGCG...
Ribosomal RNA Genes and Their Sequences
![Page 35: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/35.jpg)
Other genes …
![Page 36: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/36.jpg)
• Rate of evolution = rate of mutation• rate of evolution for any macromolecule is
approximately constant over time (Neutral Theory of evolution)
• For a given protein the rate of sequence evolution is approximately constant across lineages. Zuckerkandl and Pauling (1965)
• This would allow speciation and duplication events to be dated accurately based on molecular data
Molecular Clock (MC)
![Page 37: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/37.jpg)
Noval trees using Hox genes
![Page 38: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/38.jpg)
• (a) A traditional phylogenetic tree and
![Page 39: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/39.jpg)
• (a) A traditional phylogenetic tree and
• (b) the new phylogenetic tree, each showing the positions of selected phyla. B, bilateria; AC, acoelomates; PC, pseudocoelomates; C, coelomates; P, protostomes; L, lophotrochozoa; E, ecdysozoa; D, deuterostomes.
![Page 40: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/40.jpg)
• Local and approximate molecular clocks more reasonable– one amino acid subst. 14.5 My– 1.3 10-9 substitutions/nucleotide site/year– Relative rate test (see further)
• ((A,B),C) then measure distance between (A,C) & (B,C)
Molecular Clock (MC)
![Page 41: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/41.jpg)
Rate of Change Theoretical Lookback Time (PAMs / 100 myrs) (myrs)
Pseudogenes 400 45
Fibrinopeptides 90 200
Lactalbumins 27 670
Lysozymes 24 850
Ribonucleases 21 850
Haemoglobins 12 1500
Acid proteases 8 2300
Cytochrome c 4 5000
Glyceraldehyde-P dehydrogenase2 9000
Glutamate dehydrogenase 1 18000
PAM = number of Accepted Point Mutations per 100 amino acids.
Proteins evolve at highly different rates
![Page 42: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/42.jpg)
Phylogenetics
IntroductionDefinitionsSpecies conceptExamplesThe Tree-of-life
Phylogenetics MethodologiesAlgorithms
Distance MethodsMaximum LikelihoodMaximum Parsimony
RootingStatistical Validation
ConclusionsOrthologous genesHorizontal Gene TransferPhylogenomics
Practical Approach: PHYLIPWeblems
![Page 43: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/43.jpg)
Multiple Alignment Method
![Page 44: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/44.jpg)
• align• select method (evolutionary
model)–Distance–ML–MP
• generate tree• validate tree
4-steps
![Page 45: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/45.jpg)
![Page 46: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/46.jpg)
Some definitions …
![Page 47: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/47.jpg)
• Convert sequence data into a set of discrete pairwise distance values (n*(n-1)/2), arranged into a matrix. Distance methods fit a tree to this matrix.
• The phylogenetic topology tree is constructed by using a cluster analysis method (like upgma or nj methods).
Distance matrix methods (upgma, nj, Fitch,...)
![Page 48: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/48.jpg)
![Page 49: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/49.jpg)
![Page 50: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/50.jpg)
![Page 51: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/51.jpg)
Distance matrix methods (upgma, nj, Fitch,...)
![Page 52: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/52.jpg)
Distance matrix methods (upgma, nj, Fitch,...)
CGT
![Page 53: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/53.jpg)
Distance matrix methods (upgma, nj, Fitch,...)
Since we start with A,p(A)=1
![Page 54: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/54.jpg)
Distance matrix methods (upgma, nj, Fitch,...)
D=evolutionary distance ~ tijdF = dissimilarity ~ (1 – PX(t))
F ~ 1 – d
![Page 56: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/56.jpg)
![Page 57: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/57.jpg)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
![Page 58: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/58.jpg)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
![Page 59: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/59.jpg)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
![Page 60: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/60.jpg)
Unweighted Pair Group Method with Arithmatic Mean (UPGMA)
![Page 61: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/61.jpg)
Distance matrix methods: Summary
http://www.bioportal.bic.nus.edu.sg/phylip/neighbor.html
![Page 62: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/62.jpg)
• The phylogeny makes an estimation of the distance for each pair as the sum of branch lengths in the path from one sequence to another through the tree.
· easy to perform ;
· quick calculation ;
· fit for sequences having high similarity scores ;
• drawbacks : · the sequences are not considered as such
(loss of information) ;
· all sites are generally equally treated (do not take into account differences of substitution rates ) ;
· not applicable to distantly divergent sequences.
Distance matrix methods (upgma, nj, Fitch,...)
![Page 63: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/63.jpg)
![Page 64: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/64.jpg)
![Page 65: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/65.jpg)
• In this method, the bases (nucleotides or amino acids) of all sequences at each site are considered separately (as independent), and the log-likelihood of having these bases are computed for a given topology by using a particular probability model.
• This log-likelihood is added for all sites, and the sum of the log-likelihood is maximized to estimate the branch length of the tree.
Maximum likelihood
![Page 66: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/66.jpg)
Maximum likelihood
![Page 67: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/67.jpg)
• This procedure is repeated for all possible topologies, and the topology that shows the highest likelihood is chosen as the final tree.
• Notes : · ML estimates the branch lengths of the
final tree ; · ML methods are usually consistent ; · ML is extented to allow differences
between the rate of transition and transversion.
• Drawbacks · need long computation time to construct a
tree.
Maximum likelihood
![Page 68: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/68.jpg)
Maximum likelihood
![Page 69: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/69.jpg)
![Page 70: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/70.jpg)
Parsimony criterion • It consists of determining the minimum
number of changes (substitutions) required to transform a sequence to its nearest neighbor.
Maximum Parsimony • The maximum parsimony algorithm searches
for the minimum number of genetic events (nucleotide substitutions or amino-acid changes) to infer the most parsimonious tree from a set of sequences.
Maximum Parsimony
![Page 71: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/71.jpg)
Maximum Parsimony
Occam’s Razor
Entia non sunt multiplicanda praeter necessitatem.
William of Occam (1300-1349)
The best tree is the one which requires the least number of substitutions
![Page 72: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/72.jpg)
• The best tree is the one which needs the fewest changes. – If the evolutionary clock is not constant, the
procedure generates results which can be misleading ;
– within practical computational limits, this often leads in the generation of tens or more "equally most parsimonious trees" which make it difficult to justify the choice of a particular tree ;
– long computation time to construct a tree.
Maximum Parsimony
![Page 73: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/73.jpg)
![Page 74: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/74.jpg)
![Page 75: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/75.jpg)
![Page 76: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/76.jpg)
![Page 77: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/77.jpg)
![Page 78: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/78.jpg)
![Page 79: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/79.jpg)
![Page 80: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/80.jpg)
Maximum Parsimony: Branch Node A or B ?
![Page 81: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/81.jpg)
Maximum Parsimony: A requires 5 mutaties
![Page 82: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/82.jpg)
Maximum Parsimony: B (and propagating A->B) requires only 4 mutations
![Page 83: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/83.jpg)
• The best tree is the one which needs the fewest changes.
• Problems : – If the evolutionary clock is not
constant, the procedure generates results which can be misleading ;
– within practical computational limits, this often leads in the generation of tens or more "equally most parsimonious trees" which make it difficult to justify the choice of a particular tree ;
– long computation time to construct a tree.
Maximum Parsimony
![Page 84: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/84.jpg)
Phylogenetics
IntroductionDefinitionsSpecies conceptExamplesThe Tree-of-life
Phylogenetics MethodologiesAlgorithms
Distance MethodsMaximum LikelihoodMaximum Parsimony
RootingStatistical Validation
ConclusionsOrthologous genesHorizontal Gene TransferPhylogenomics
Practical Approach: PHYLIPWeblems
![Page 85: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/85.jpg)
· There is at present no statistical methods which allow comparisons of trees obtained from different phylogenetic methods, nevertheless many studies have been made to compare the relative consistency of the existing methods.
Comparative evaluation of different methods
![Page 86: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/86.jpg)
· The consistency depends on many factors, among these the topology and branch lengths of the real tree, the transition/transversion rate and the variability of the substitution rates.
· One expects that if sequences have strong phylogenetic relationship, different methods will show the same phylogenetic tree
Comparative evaluation of different methods
![Page 87: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/87.jpg)
Comparison of methods
• Inconsistency• Neighbour Joining (NJ) is very fast but depends on
accurate estimates of distance. This is more difficult with very divergent data
• Parsimony suffers from Long Branch Attraction. This may be a particular problem for very divergent data
• NJ can suffer from Long Branch Attraction• Parsimony is also computationally intensive• Codon usage bias can be a problem for MP and NJ• Maximum Likelihood is the most reliable but
depends on the choice of model and is very slow• Methods may be combined
![Page 88: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/88.jpg)
Rooting the Tree
• In an unrooted tree the direction of evolution is unknown
• The root is the hypothesized ancestor of the sequences in the tree
• The root can either be placed on a branch or at a node
• You should start by viewing an unrooted tree
![Page 89: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/89.jpg)
Automatic rooting
• Many software packages will root trees automaticall (e.g. mid-point rooting in NJPlot)
• Sometimes two trees may look very different but, in fact, differ only in the position of the root
• This normally involves assumptions… BEWARE!
![Page 90: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/90.jpg)
Rooting Using an Outgroup
1. The outgroup should be a sequence (or set of sequences) known to be less closely related to the rest of the sequences than they are to each other
2. It should ideally be as closely related as possible to the rest of the sequences while still satisfying condition 1
The root must be somewhere between the outgroup and the rest (either on the node or in a branch)
![Page 91: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/91.jpg)
How confident am I that my tree is correct?
Bootstrap values
Bootstrapping is a statistical technique that can use random resampling of data to determine sampling error for tree topologies
![Page 92: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/92.jpg)
Bootstrapping phylogenies
• Characters are resampled with replacement to create many bootstrap replicate data sets
• Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML etc.)
• Agreement among the resulting trees is summarized with a majority-rule consensus tree
• Frequencies of occurrence of groups, bootstrap proportions (BPs), are a measure of support for those groups
![Page 93: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/93.jpg)
Bootstrapping - an example
Ciliate SSUrDNA - parsimony bootstrap
Majority-rule consensus
Ochromonas (1)
Symbiodinium (2)
Prorocentrum (3)
Euplotes (8)
Tetrahymena (9)
Loxodes (4)
Tracheloraphis (5)
Spirostomum (6)
Gruberia (7)
100
96
84
100
100
100
![Page 94: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/94.jpg)
• Bootstrapping is a very valuable and widely used technique (it is demanded by some journals)
• BPs give an idea of how likely a given branch would be to be unaffected if additional data, with the same distribution, became available
• BPs are not the same as confidence intervals. There is no simple mapping between bootstrap values and confidence intervals. There is no agreement about what constitutes a ‘good’ bootstrap value (> 70%, > 80%, > 85% ????)
• Some theoretical work indicates that BPs can be a conservative estimate of confidence intervals
• If the estimated tree is inconsistent all the bootstraps in the world won’t help you…..
Bootstrap - interpretation
![Page 95: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/95.jpg)
Jack-knifing
• Jack-knifing is very similar to bootstrapping and differs only in the character resampling strategy
• Jack-knifing is not as widely available or widely used as bootstrapping
• Tends to produce broadly similar results
![Page 96: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/96.jpg)
At present only sampling techniques allow testing the topology of a phylogenetic tree
· Bootstrapping
» It consists of drawing columns from a sample of aligned sequences, with replacement, until one gets a data set of the same size as the original one. (usually some columns are sampled several times others left out)
· Half-Jacknife
» This technique resamples half of the sequence sites considered and eliminates the rest. The final sample has half the number of initial number of sites without duplication.
Statistical evaluation of the obtained phylogenetic trees
![Page 97: Bioinformatics t6-phylogenetics v2013-wim_vancriekinge](https://reader034.vdocuments.us/reader034/viewer/2022052212/5408ac4f8d7f723b058b457c/html5/thumbnails/97.jpg)
Weblems
W6.1: The growth hormones in most mammals have very similar ammo acid sequences. (The growth hormones of the Alpaca, Dog Cat Horse, Rabbit, and Elephant each differ from that of the Pig at no more than 3 positions out of 191.) Human growth hormone is very different, differing at 62 positions. The evolution of growth hormone accelerated sharply in the line leading to humans. By retrieving and aligning growth hormone sequences from species closely related to humans and our ancestors, determine where in the evolutionary tree leading to humans the accelerated evolution of growth hormone took place.
W6.2: Humans are primates, an order that we, apes and monkeys share with lemurs and tarsiers. On the basis of the Beta-globin gene cluster of human, a chimpanzee, an old-world monkey, a new-world monkey, a lemur, and a tarsier, derive a phylogenetic tree of these groups.
W6.3: Primates are mammals, a class we share with marsupials and monotremes; Extant marsupials live primarily in Australia, except for the opossum, found also in North and South America. Extant monotremes are limited to two animals from Australia: the platypus and echidna. Using the complete mitochondnal genome from human, horse (Equus caballus), wallaroo (Macropus robustus), American opossum (Didelphis mrgimana), and platypus (Ormthorhynchus anatmus), draw an evolutionary tree, indicating branch lengths. Are monotremes more closely related to placental mammals or to marsupials?
W6.4: Mammals are vertebrates, a subphylum that we share with fishes, sharks, birds and reptiles, amphibia, and primitive jawless fishes (example: lampreys). For the coelacanth (Latimeria chalumnae), the great white shark (Carcharodon carcharias), skipjack tuna (Katsuwonus pelamis), sea lamprey (Petromyzon marinus), frog (Rana Ripens), and Nile crocodile (Crocodylus niloticus), using sequences of cytochromes c and pancreatic ribonucleases, derive evolutionary trees of these species.