phylogenomics. phylogenetics phylogenomics reconstruction of phyletic relationships based on the...

53
Phylogenomics

Upload: derrick-warner

Post on 11-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Phylogenomics

Page 2: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Phylogenetics

Phylogenomics

reconstruction of phyletic relationships based on the analysis of -

- several (to several dozens) genes

- complete genetic information (ideal)- several dozens to hundreds of coding sequences (phylotranscriptomics)

Page 3: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Why?

vast amount of genetic information should significantly improve the prediction of phylogenetic relationships and

eliminate signal noise

... and sometimes it really works

Page 4: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete
Page 5: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete
Page 6: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Adl et al, 2012

Page 7: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

... but sometimes it doesn’t

Page 8: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

possible source of error: - incorrect sequence annotation

Page 9: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete
Page 10: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

OCT

ATC

possible source of error: - paralogues

Page 11: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

possible source of error: - sins of the past

Page 12: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

L/HGT

possible source of error: - sins of the past

Page 13: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

EGT

Page 14: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

“LEUCA”

...ANIMALS/FUNGI PLANTS RHODOPHYTES

EGT

...

Page 15: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

18(16)S rRNA

- combination of variable and conserved regions

- zero L/HGT- exhaustive taxon sampling- known secondary structure- hundreds of copies per cell -

single-cell PCR- cost per nt + speed- ‘18S is always right’

+

- - ~1800bp- intraindividual paralogues- lower branching support

MULTI-PROTEIN DATASETs

+ - large ammount of information- modular- robust branching support

(although often false)

- - limited sampling- variable quality of

phylohenetic signal- L/H(E)GT- still costly and slow- HW demanding analysis- stability of topologies (or lack

of thereof)

Page 16: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

DATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGP

CONCATENATION

MGP

n

( )

Page 17: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

DATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGF

CONCATENATION

MGF

- lots of redundancy in dbs (duplicates, close paralogues...)- usually it is better to get rid of them

sequence clustering

+ - speed, relative HW friendly, accuracy

- - accuracy, black-box

Page 18: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

CD-HIT

Page 19: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

USEARCH

Page 20: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

DB editing

FASTA – universal and simple!, but non unified

NCBI:>gi|269120277|ref|YP_003308454.1| carbamate kinase [Sebaldella termitidis ATCC 33386]MKNRIVVALGGNALGNSAKEQRDAVRETAIPIVDLIEAGHEVILAHGNGPQVGMINLAMDSATKNLPSFAEMPITECVAMSQGYIGYHLQRFIRDELKRRNIDKEVATIVTEVLVDGDDPAFKSPNKPIGAFYTKEEAEKLEKQGYTMMEDAGRGYRRVVASPKPVDIVQKKTIKTLIDNSQIVITVGGGGIPVKYVEGKGTLGEFAVIDKDFASAKLAELIDADYLIILTAVEKIAINYGKENEQWLDKLSIDDAKKYIKEGHFAPGSMLPKVEAALGFAASKQGRRALVTSLEKAKDGIAGLTGTVIVDEK

JGI:>jgi|Dappu1|290510|JCO_fgenesh1_kg.C_scaffold_4000019MKLVYTVASAFLVVLIAQSAYASEKLSAQDYAYNSTCLNHLRSHIKRELQAAVTYLAMGAWANHYSVQRPGLANFFFDSASEEREHGLKLLGYLRMRGHNDLDILPSSLEPLNGKYEWENSLSALRQALKMEKDVTESIKKIIDYCADAEDHQLADYLTGDFMEEQLKGQRNVAGLANTLQGVLRKQPRLGEWIFDNNLSKSMAV

manual for several sequences but several thousands?

GB's of RAM

robust OS and text editor

!Regular expressions!

Page 21: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

>gi|269120277|ref|YP_003308454.1| carbamate kinase [Sebaldella termitidis ATCC 33386]MKNRIVVALGGNALGNSAKEQRDAVRETAIPIVDLIEAGHEVILAHGNGPQVGMINLAMDSATKNLPSFAEMPITECVAMSQGYIGYHLQRFIRDELKRRNIDKEVATIVTEVLVDGDDPAFKSPNKPIGAFYTKEEAEKLEKQGYTMMEDAGRGYRRVVASPKPVDIVQKKTIKTLIDNSQIVITVGGGGIPVKYVEGKGTLGEFAVIDKDFASAKLAELIDADYLIILTAVEKIAINYGKENEQWLDKLSIDDAKKYIKEGHFAPGSMLPKVEAALGFAASKQGRRALVTSLEKAKDGIAGLTGTVIVDEK

Find:>\w+\|\d+\|\w+\|(\w+).*\[(\w+\s\w+).*

Replace:>\2_\1

>Sebaldella termitidis_YP_003308454MKNRIVVALGGNALGNSAKEQRDAVRETAIPIVDLIEAGHEVILAHGNGPQVGMINLAMDSATKNLPSFAEMPITECVAMSQGYIGYHLQRFIRDELKRRNIDKEVATIVTEVLVDGDDPAFKSPNKPIGAFYTKEEAEKLEKQGYTMMEDAGRGYRRVVASPKPVDIVQKKTIKTLIDNSQIVITVGGGGIPVKYVEGKGTLGEFAVIDKDFASAKLAELIDADYLIILTAVEKIAINYGKENEQWLDKLSIDDAKKYIKEGHFAPGSMLPKVEAALGFAASKQGRRALVTSLEKAKDGIAGLTGTVIVDEK

extremely powerful, easy to learn, fun to use:

!Regular expressions!

Page 22: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete
Page 23: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete
Page 24: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

BLAST vs annotation

BLAST (plus relatives) is the only reliable way to identify homologues, do not rely on annotation!

the more the better

beware of close paralogues! Meticulous SGF necessary

DATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGP

CONCATENATION

MGP

Page 25: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

OCT

ATC

possible source of error: - paralogues

Page 26: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

commercialDATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGP

CONCATENATION

MGP

Page 27: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

vs. free

- both (shiny GUI/command-line scripts) will get you there relatively fast and easy but... beware of possible errors, there is no universal solution

DATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGP

CONCATENATION

MGP

Page 28: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Multiple alignment

- important and necessary step in identification and definition of dna or protein domains,oligonucleotide design, phylogenetic analyses...

- most of the modern algorithms are iterative (can self-improve during the iterations) and reasonably good working (really, don’t use Clustal unless you really have to), some of the most used are:MAFFT, MUSCLE, Kalign, ProbCons (none of them miraculous, each makes mistakes, but it’s not that bad)

- all of the above mentioned are accessible on-line (follow the hyperlinks) or can be run locally... nevertheless, you’ll have to use some alignment-viewer/editor to visualize them

- several free options (depending on what OS you use) MS WIndows: Bioedit- the living legend’, extensive features, user-friendly, can import from GenBank, align (also translation alignment, although with ), edit, annotate, translate, do phylogeny... Mac: MacClade - great editing features and them some more, user friendly, but doesn’t align, nor does phylogenies currently work only up to OSX 10.6, not (mountain) lion. Multi-platform: MEGA – good for alignment, phylogenetic and molecular evolution analyses

Jalview – excellent for proteomics, passable alignment editor

SeaView – great aligner/editor (although takes time to get use to it), excellent features for phylogenetics (inclusion sets, translation alignment, there’s

no UNDO button!)... and then again, if you have access/can afford Geneious (student licenses are cheap), you can skip everything listed above

Page 29: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Editing

- remember: the tree is as good as is the alignment; crap-in-crap-out!- the goal is to keep only unambiguously aligned regions and relevant OTU (remove duplicates or long-branchers)

site selection: AUTOMATED vs. MANUAL

automated: good as a starting point, reproducible, ‘objective’, transparent, but ... crudemanual: subjective, often non-reproducible, needs ‘expertise’, but... better (usually), can be fine-tuned to the each respective dataset

Page 30: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Example- SeaViewopen dataset (in this case apicomplexa_ssu1.fas) and align it.. you already know how, right?

Some regions are conserved (i.e., not much divergent diversity), there’s little doubt about the correctness of alignment. They should be kept for analysis as they carry vital information.

Page 31: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Example- SeaView

On the other hand, some are pretty variable and could be aligned in several ways. Because we cannot be sure the information they contain is correct, we should exclude these prior to analysis in order not to introduce error (remember, crap-in-crap-out).

Page 32: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

In some situations, especially when you’re fresh to the problematics, it is not so clear what parts of alignments should be kept and what excluded from analysis. Gblocks (or similar SW) can help you. Luckily, it is also implemented in SeaView: as it tends to remove too much, let’s keep the

parameters the least strict

regions with X are kept, those with dashes excluded from selectionyou can edit the selection afterwards and save it using Files-Save selection

Page 33: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

you can also directly perform phylogenetic analysis by clicking on Trees

you can choose from three different methods, PhyML represents Maximum likelihood

the default settings are reasonable compromise between speed and precision, so you can leave them on

for publication, you will have to also assess branching support

and you may want to use more thorough algorithm of tree search (check ‘Best of NNI and SPR’)

then hit Run ... and wait ... time depends on the method and size of dataset (obviously, the bigger the longer).

Page 34: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

ING

ROU

P

OUTGROUP (root)

branch scale bar (substitutions per site – the longer the branch, the more divergent the sequence)

node (represents hypothetical ancestor of all taxa/branches stemming off the node, also defines clade)

clade (group of sequences sharing common ancestor/stemming from single node)

sister taxa(two taxa forming clade )

sister clades

SeaView has also implemented very decent tree viewer/editor

Page 35: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

you can also create several subsets of alignments (inclusion sets) by clicking Sites-Create set

and give it the name

parts of sequences above X (highlighted) are included in selection. You can select the sites by combination of right- and left-clicks (left unselect point sites, right removes selection between two unselected regions, single left-click select single site, by holding left and moving mouse, you can re-select the whole regions) I know, it sounds awkward... TRY TO PRACTICE iT!you can then duplicate-rename and create different inclusion sets and Save just selection, not the whole alignment. This feature can be extremely useful in phylogenies and sets the SeaView apart from the others alignment editors (will get to it next time)

Page 36: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Coding sequences should be aligned in ‘translation’ mode – temporarily translated into and aligned as amino acids and back-translated into nucleotides keeping the alignment positions

in SeaView click Props-View as proteins

uncheck View as proteins

now, the sequences are aligned according ORF

Page 37: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Phylogenetic inferenceYou don’t have to use the state-of-art phylogenetic methods for initial analysis/es, which purpose is

to (quickly) identify redundancy (duplicates and very similar sequences), aberrant and very divergent sequences or the need to extend the dataset (quite often, you realize, you should’ve add some other taxa). For that, simple neighbour joining tree based on J-C, K2P or HKY model, or stripped-down maximum likelihood run (without gamma categories and branching support) would suffice and do the job quickly even on some older computers.

On the other hand, for the purpose of the publication (or if you want to be sure), once you’ve polished your dataset, you should use the best (possible) methods. That usually means Maximum-likelihood with gamma-corrected and GTR (nucleotides) of LG or WAG (amino acids) substitution matrices (or models of evolution, if you wish... these matrices tells computer, how probable is change from one state to another). But, it all depends on the dataset... if the sequences are similar and/or there’re just few of them, it may be preferable to use simpler matrices/models. There are also some models dedicated to the organellar genomes and/or specific taxonomic groups (like mtArt, which is tailored for analysis of mitochondrial genes of arthropods). There are some programs to tell you, which model suits your dataset the best (for example jModeltest for nucleotides and ProtTest (available also as a server).

The credibility of topology should be ‘tested’ using (non-parametric) bootstrap analysis, during which software creates subreplicates made of random parts of the sequences (all taxa are included) and infers topology form these subreplicates instead of the original dataset. For the purpose of the publication 100 replicates are a bare minimum, the reviewer will probably require 300 or higher number though. If the analysis is meant just for you (or your boss), 100 is totally enough (in my opinion), alternatively you can use even faster method called ‘approximate Likelihood-Ratio test’ (aLRT, implemented in some software).

Nowadays, most reviewers/editors will also require another type of phylogenetic analysis called Bayesian inference. Here, you use the same (similar) models, but the method of topology search is totally different, also, the branching support is expressed as a posterior probability (ranging from 0-1), instead of bootstrap values. Be careful with interpretation of these two values. In bootstrap, everything higher than 50 (meaning the topology appeared in at least 50% of the replicates) is considered to be supported (although weakly), the more you approaching 100, the more confident you could be with the branching. OTOH, the posterior probability anything bellow 0.95 (some go to 0.90) shall be considered as unsupported! Only nodes with 1.0 (or 0.99) PP value are considered to be strongly supported.

Page 38: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Phylogenetic inference - softwareSurprisingly lot software is available (given the obscurity of the topic, almost-exhaustive list to be found here), but most are either too specialized, slow, obsolete or not worth use from some different reasons . Unfortunately, most (like 99%) are command-line based without any user-friendly graphic interface. But some of the good/passable are implemented in SW with GUI (like SeaView or Geneious) or at least have server-version. So, here is the short list some recommended phylogenetic software:

Ambiguous regions detection/removal: several SW, but nothing exciting, try Aliscore or Gblock (server)

Distance methods: PAUP (commercial), Phylip, BioNJ

Maximum Parsimony: PAUP (commercial), Phylip

Maximum likelihood: RAxML (server), PhyML (server), FastTree (REALLY fast, great for preliminary analyses), garli

Bayesian Inferrence: MrBayes, Phylobayes

Tree Viewer/Editor: NJplot (improved version also implemented in SeaView), FigTree, Treeview

this list is far from being exhaustive, but above noted SW should fit general audience (like you ) in terms of purpose and performance.

Page 39: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

meticulous analysis of SGP is necessary!!!

you could use also the automated approach (Phylosorter), but the risk of error is quite a significant and the parameters should be as strict as possible

DATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGP

CONCATENATION

MGP

n

( )

Page 40: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

‘clean’ datasets could be merged (concatenated) into the supermatrix

Scafos, phyutility, SeaView, MacClade, Bioedit?...

DATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGP

CONCATENATION

MGP

Page 41: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- both SW and HW demanding- due to the amount of data. the most complex models are

necessary, prone to errors and time consuming

+ SHOULD produce robust results

Multi-Gene PhylogeniesDATABASE

PURIFIED DATABASE

HOMOLOGUES

DATASETS

MSA

SGP

CONCATENATION

MGP

Page 42: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

why? - poor taxon sampling - too weak/strong phylogenetic signal - violation of the model assumptions (different base composition, mutation rates...) - inappropriate model used

phylogenetic artifacts

Page 43: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Long-Branch Attraction (LBA)

- the most (in)famous and common artifact- high evolutionary rates cause artificial grouping of long-branching taxa

Page 44: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- adding more genes

Artifacts elimination

2012 - 2582009 - 1272008 - 135same author – different datasets

Page 45: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- adding more genes

- adding more taxa

- poor taxon sampling is considered to be the most common reason- ideally, all taxa should be included- reasonably, all relevant and available taxa should be included- realistically, we have to work with the few available

Artifacts elimination

Page 46: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Artifacts elimination

- adding genes to MGP- adding more taxa

- removal of problematic (fast-evolving) taxa- improving methodology

- analysis of dataset with different combination of taxa and comparison of resulting topologies

- efficient way to over-come the LBA

Page 47: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- current HW a SW enable application of the state-of-art models- LG4M, LG4X (RAxML)- CAT(+GTR): each position of alignment has specific equilibrium and model

parameters- covarion, non-homogenous: each taxon has specific rate of evolution- HW and time demanding!

Artifacts elimination

- adding genes to MGP- adding more taxa- removal of problematic (fast-evolving) taxa

- improving methodology

Page 48: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- simple and fast way to reduce signal noise- for each gene, we compute overall ML distance and remove the the

most divergent genes

- TREEPUZZLE, RAxML

Artifacts elimination

- adding genes to MGP- adding more taxa- removal of problematic (fast-evolving) taxa- improving methodology

- removal of fast evolving genes

Page 49: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- usually more efficient- each site of alignment is assigned to specific rate category (usually

8/16)- the highest category(ies) are removed- dependent on topology/model- TREEPUZZLE, AIRremover

Artifacts elimination

- adding genes to MGP- adding more taxa- removal of problematic (fast-evolving) taxa- improving methodology- removal of fast-evolving genes

- removal of fast-evolving sites

Page 50: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- for datasets with a large proportion of saturated sites

- amino acids are recoded according to their biochemical properties to four categories (Dayhoff matrix)

Artifacts elimination

- adding genes to MGP- adding more taxa- removal of problematic (fast-evolving) taxa- improving methodology- removal of fast-evolving genes- removal of fast-evolving sites

- decoding of aa

Page 51: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

- clever, but is it kosher? ... doesn’t work that well anyway - concaterpillar

Artifacts elimination

- adding genes to MGP- adding more taxa- removal of problematic (fast-evolving) taxa- improving methodology- removal of fast-evolving genes- removal of fast-evolving sites- decoding of aa

- selection of genes with congruent signal

Page 52: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

Phylogenomics is (not)surprisingly hard to publish, usually you have to do combination of at least few above to satisfy the reviewers!

Artifacts elimination

- adding genes to MGP- adding more taxa- removal of problematic (fast-evolving) taxa- improving methodology- removal of fast-evolving genes- removal of fast-evolving sites- decoding of aa- selection of genes with congruent signal

Page 53: Phylogenomics. Phylogenetics Phylogenomics reconstruction of phyletic relationships based on the analysis of - - several (to several dozens) genes - complete

So... is it worth when quite often you get the same topology as with SSU rRNA?

?