interpreting molecular phylogenetic trees aidan budd structural and computational biology unit embl...
TRANSCRIPT
![Page 1: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/1.jpg)
Interpreting molecular phylogenetic trees
Aidan BuddStructural and Computational Biology
UnitEMBL Heidelberg, Germany
EMBO Practical Course on Computational Molecular Evolution
IMBG-HCMR, Heraklion, Greece
Monday 3rd - Tuesday 4th May 2010
![Page 2: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/2.jpg)
Part 1
![Page 3: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/3.jpg)
Session Homepage:
http://tinyurl.com/interpretPhyloMolEvol2010
![Page 4: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/4.jpg)
Session Aims
1.Highlight key aspects of molecular evolutionary studies• common concepts
• clearly it's important to understand these as well as possible
• typical questions
• helps planning your own analyses
• applications to other fields
• helps identifying potential collaborations and applying for funding
2.Review basic concepts and terminology1.Provide a common background for later sessions
• Demonstrate selected important tools and resources
• Place different components of an analysis in context
![Page 5: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/5.jpg)
Not just you think molecular
evolution/phylogenetics is important...
![Page 6: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/6.jpg)
Number of Publications Per Year
query: ((phylogeny OR phylogenetic OR phylogenies OR phylogenetics))
Source: ISI Web of Knowledge as of 28.03.2010
query: (((evolution OR evolutionary OR evolves OR evolve) AND (molecule OR molecular OR molecules)))
molecular evolutionphylogenies
total since 1975: +100,000
total since 1975: 72638
• Many molecular evolution/phylogenetics articles are published each year
• Number of articles published increases year-by-year
![Page 7: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/7.jpg)
Highly Cited Articles
Method/SoftwarePublicati
onYear
Original Citation# of
Citations
MEGA3 2004Kumar et al. 2004Brief Bioinform.;5(2):150-63. PMID: 15260895
6630
MRBAYES 2001Huelsenbeck and Ronquist 2001Bioinformatics;17(8):754-5 PMID: 11524383
5707
CLUSTALW ** 1994Thompson et al. 1994Nucleic Acids Res.;22(22):4673-80 PMID: 7984417
29658
BLAST * 1990Altschul et al. 1990J Mol Biol.;215(3):403-10 PMID: 2231712
27660
Neighbor-Joining Algorithm
1987Saitou and Nei 1987Mol Biol Evol.;4(4):406-25 PMID: 3447015
20523
Non-Parametric Bootstrap in Phylogenetics
1985Felsenstein 1985Evolution;39(4):783-91 PMID: N/A
14566Source: ISI Web of Knowledge, as of 29.03.2010as of 2006 in the ISI web of knowledge:
* most cited paper that year, 26th most cited in the entirety of science** second most cited paper that year, 31st most cited paper in the entirety of science
•Some molecular-evolution-related articles are VERY highly cited
![Page 8: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/8.jpg)
Applications of Phylogenies
![Page 9: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/9.jpg)
Applications of Phylogenetics
•Epidemiology
•Forensics
•Medical treatment selection
•Selecting conservation targets
•Monitoring trade in illegal organisms
•Bioinformatics tools - in particular:
•building MSAs
•predicting function
![Page 10: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/10.jpg)
Applications of Phylogenetics:Epidemiology
Clonal origin and evolution of a transmissible cancer Nyrgua et al. PMID: 16901782
Characterise evolutionary history of a pathogenDog cancer known to be infectious - is the infectious agent:•a virus (cf Human Papilloma Virus)?•dog cancer cells themselves?Root assumed on this branchTumor and host-dog DNA compared and used to draw a treeAll tumor sequences more closely related to each other than any are to host-dogs
Tumor itself is the infectious agent!
![Page 11: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/11.jpg)
Applications of Phylogenetics:Forensics
Analysis of a rape case by direct sequencing of the human immunodeficiency virus type 1 pol and gag genes.Albert et al., PMID:7520096
Using HIV pol and gag genes to estimate phylogeny of viruses from•male rape suspect•female rape victim•random individualsFemale victim’s virus is clearly more closely related to male suspect’s viruses compared to any other sequencesSupports guilt of male suspectConclusion depends on determining order of lineage divergence events
![Page 12: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/12.jpg)
Applications of Phylogenetics:Medical Treatment Selection
Nocardia cyriacigeorgica, an emerging pathogen in the United States.Schlaberg R, Huard RC, Della-Latta P.J Clin Microbiol. 2008 Jan;46(1):265-73.PMID: 18003809Figure 2 Nocardia
cyriacigeorgica
Characters such as drug resistance can vary across the phylogeny
Drug selection for a novel strain can be informed using phylogeny and knowledge of these character distributions
![Page 13: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/13.jpg)
Applications of Phylogenetics:Selecting Conservation Targets
Prioritise organisms for inclusion in conservation programs, taking into account•phylogenetic diversity•conservation costs•probability of extinction
Resource-aware taxon selection for maximizing phylogenetic diversity.Pardi F, Goldman N.Syst Biol. 2007 Jun;56(3):431-44.MID: 17558965Figure 4
![Page 14: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/14.jpg)
Applications of Phylogenies:Monitoring Trade in Illegal Organisms
Genetic evidence of illegal trade in protected whales links Japan with the US and South Korea.Baker CS, Steel D, Choi Y, Lee H, Kim KS, Choi SK, Ma YU, Hambleton C, Psihoyos L, Brownell RL, Funahashi N.Biol Lett. 2010 Apr 14. Figure 1
Determining the source animals for unlabelled meats
Used here to track illegal trade in protected whale and dolphin species
![Page 15: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/15.jpg)
Applications of Phylogenetics:Bioinformatics Tools
STRING
STRING 8--a global view on proteins and their functional interactions in 630 organisms.Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C.Nucleic Acids Res. 2009 Jan;37(Database issue):D412-6.PMID: 18940858
Further exampleA tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences.Chica C, Labarga A, Gould CM, López R, Gibson TJ.BMC Bioinformatics. 2008 May 6;9:229.PMID: 18460207
![Page 16: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/16.jpg)
Applications of Phylogenetics:Building Progressive Multiple
Alignments
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Thompson JD, Higgins DG, Gibson TJ.Nucleic Acids Res. 1994 Nov 11;22(22):4673-80.PMID: 7984417
An algorithm for progressive multiple alignment of sequences with insertions.Löytynoja A, Goldman N.Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62.PMID: 16000407
![Page 17: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/17.jpg)
Applications of Phylogenetics
•Relatedness
•Timing historical events
Common themes
•Are my sequences related to each other?
NOT!
![Page 18: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/18.jpg)
Rooted Phylogenies
Terminology and Concepts
![Page 19: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/19.jpg)
Alternative Tree-Related Terminologies
leaftipterminal nodeexternal node
brancharcedge
![Page 20: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/20.jpg)
Trees: Branches and Nodes
Trees consist of:
branches
nodes (ends of branches)
![Page 21: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/21.jpg)
Internal/External Nodes/Branches
Branches and Nodes are either:
Node - associated with an extant sequence/OTU (operational taxonomic unit)
external/terminal
Branch - links an external and an internal node
internal/interiorNode - at the intersection of two or more branchesBranch - links two internal nodes
![Page 22: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/22.jpg)
Branches
Branches•represent successive generations of
“taxa”
•‘later” taxa have “earlier” taxa as their ancestors
•i.e. a lineage
•time flows from the base of the tree to the tips
Time
![Page 23: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/23.jpg)
Internal Nodes
•represent hypothetical ancestral taxa/sequences/organisms i.e. HTUs - hypothetical taxonomic units
Internal Nodes
•A "special' internal node
•The most recent common ancestor of all OTUs
•Usually implies many other less recent common ancestors
Root (Root Node)
Time
![Page 24: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/24.jpg)
Parent/Daughter Branches
Time
diverge into
multiple daughter lineages/branches
parental/ancestral lineages/branches
![Page 25: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/25.jpg)
Polytomies
Internal nodes with two daughter branches are bifurcations
Internal nodes associated with more than two daughter branches
Polytomies
How many bifurcations on the tree? (a) 4 (b) 5 (c) 6
![Page 26: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/26.jpg)
Interpreting Polytomies
A B C
Soft Polytomy
Lineages only bifurcate - internal lineages so short that no identifiable change/evolution occurred along them
Thus true pattern of lineage divergence cannot be resolved
A B C
Hard Polytomy
Ancestral lineage diverged into 3+ lineages simultaneously
NB: Some software only accepts bifurcating trees
![Page 27: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/27.jpg)
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
Relatedness
Time
tu
v
w
x
y
z
We can list the set of branches in the complete lineage of each OTU
that neither shares with e
a and care more closely related to each other
because their complete lineages share (at least one) more recent common...
than either is to e
ancestor(s) [ t and u ]branch(es) [ yt and tu ]
![Page 28: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/28.jpg)
Relatedness Statements: 'X is more closely related to Y
than to Z'
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
•a is more closely related to b than either is to g•b, d and e are more closely related to each other than they are to f
A set of OTUs are more closely related to each other than any are to other OTUs if they share more recent common ancestor(s)/lineage(s) not shared with other OTUs
![Page 29: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/29.jpg)
•g is equally distantly related to f as (g is related) to d•i is equally distantly related to f, e, and a
Relatedness Statements: 'X is equally distantly related to
Y and Z'
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
An OTU is equally distantly related to two (or more) other OTUs if it shares the same most recent common ancestor with these two (or more) OTUs
![Page 30: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/30.jpg)
Relatedness Statements: 'X is most closely related to Y'
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
•g is most closely related to h•a, b, and c are most closely related to d
An OTU (or a set of OTUs) are most closely related to another OTU (or set of OTUs) if they share one (or more) most recent common ancestor(s)/lineage(s) not shared by any other OTU (or group of OTUs)
![Page 31: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/31.jpg)
Relatedness Statements: 'X is the sister group of Y'
Time
tu
v
w
x
y
z
•d is the sister group of the group of taxa [a, b, c]•the group of taxa [g, h] and the taxon i are sister groups
OTUs (or groups of OTUs) most closely related to each other are sometimes referred to as sister groups
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
![Page 32: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/32.jpg)
Relatedness
1. a2. d3. d and b4. a and b5. b
Time
tu
v
w
x
y
z
OTU Branchesa zy yt tu uv va
b zy yt tu uv vb
c zy yt tu uc
d zy yt td
e zy ye
f zfg zx xw wg
h zx xw wh
i zx xi
c is most closely related to...?
![Page 33: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/33.jpg)
Tree Topology
Trees “rotated” around internal branches have the same topology
For rooted trees, different topologies describe different patterns of relatedness of OTUs
Trees with identical topologies
x
y
z
w
The branch intersections (i.e. internal nodes) of a tree specify its topology
Node Branchesz yz za zby xy yw yzw yw wc wdx xy xe
xy
z
w
![Page 34: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/34.jpg)
Tree Representations
Most rooted tree figures use a “rectangular” rather than a “diagonal” representation
Diagonal
Rectangular
Rectangular trees represent internal nodes with lines perpendicular to lines representing the branches
![Page 35: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/35.jpg)
Unrooted Phylogenies
![Page 36: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/36.jpg)
Unrooted Trees
There’s no root on the tree
Thus, there is no statement about the DIRECTION of time (i.e. of direction of divergence of lineages in the tree)
Many applications of phylogenies require a rooted treeBut many tree estimation tools yield only unrooted trees!
thus - we can’t distinguish daughter from parent branches
![Page 37: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/37.jpg)
There are multiple rooted trees possible from a given unrooted tree
The number of possible rooted tree topologies is the number of branches on the unrooted tree (assuming a bifurcating root)
Unrooted Rooted
![Page 38: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/38.jpg)
There are multiple rooted trees possible from a given unrooted tree
If you allow a polytomy at the root...
there is one additional rooted tree possible for each internal node in the unrooted tree
Unrooted Rooted
cd
ba
ab
dc
![Page 39: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/39.jpg)
Which sequence is d most closely related to?
a
b
c
Quiz
None of them - It depends where the root is!
However, if d is most closely related to a single OTU and not to a group, then that OTU must be c (see previous slide)
i.e. d is certainly NOT most closely related to a or to b
![Page 40: Interpreting molecular phylogenetic trees Aidan Budd Structural and Computational Biology Unit EMBL Heidelberg, Germany EMBO Practical Course on Computational](https://reader038.vdocuments.us/reader038/viewer/2022103101/5697bf981a28abf838c914c0/html5/thumbnails/40.jpg)
Visualising Trees
Demonstration and Exercise
Viewing and manipulating unscaled trees with NJplot•Rotating around internal branches•Re-rooting