introduction to phylogenetic analysis irit orr feb 2005reconstruction is to attempt to estimate the...
TRANSCRIPT
Intr
oduc
tion
to
Phyl
ogen
etic
Ana
lysi
s
Irit
Orr
Feb
2005
Sub
ject
s of
this
lect
ure
Intro
duci
ng s
ome
of th
e te
rmin
olog
y us
ed in
ph
ylog
enet
ics.
Intro
duci
ng th
e m
ore
com
mon
evo
lutio
nary
m
odel
s
Intro
duci
ng s
ome
of th
e m
ost c
omm
only
use
d m
etho
ds fo
r rec
onst
ruct
ing
phyl
ogen
etic
s
Exp
lain
how
to b
uild
/read
phy
loge
netic
trees
.
Taxo
nom
y-i
s th
e sc
ienc
e of
cla
ssifi
catio
n of
org
anis
ms.
Phy
loge
ny –
is th
e ev
olut
ion
of a
gen
etic
ally
re
late
d gr
oup
of o
rgan
ism
s.
Or:
A s
tudy
of r
elat
ions
hips
be
twee
n co
llect
ion
of "t
hing
s"
(gen
es, p
rote
ins,
org
ans.
.) th
at
are
deriv
ed fr
om a
com
mon
an
cest
or
Phy
loge
netic
s-
Fiel
d of
bio
logy
that
de
als
with
the
rela
tions
hips
bet
wee
n ta
xa(o
rgan
ism
s, m
olec
ular
dat
a).
It in
clud
es th
e di
scov
ery
of th
ese
rela
tions
hips
, and
the
stud
y of
the
caus
esbe
hind
this
pat
tern
.
Phy
loge
netic
s-W
HY
?
Find
evo
lutio
nary
ties
bet
wee
n or
gani
sms.
(Ana
lyze
cha
nges
occ
urin
gin
diff
eren
t org
anis
ms
durin
g ev
olut
ion)
.Fi
nd (u
nder
stan
d) re
latio
nshi
ps b
etw
een
an
ance
stra
l seq
uenc
e an
d it
desc
enda
nts.
(Evo
lutio
n of
fam
ily o
f seq
uenc
es)
Est
imat
e tim
e of
div
erge
nce
betw
een
a gr
oup
of
orga
nism
s th
at s
hare
a c
omm
on a
nces
tor.
Phy
loge
netic
s-W
HY
?Th
e pu
rpos
e of
phy
loge
netic
reco
nstru
ctio
n is
to
atte
mpt
to e
stim
ate
the
phyl
ogen
y fo
r som
e da
ta.
The
basi
c as
sum
ptio
n is
that
:Fo
r any
col
lect
ion
of d
ata
ther
e w
ill b
e so
me
ance
stra
l re
latio
nshi
p be
twee
n th
ecu
rren
t(co
ntem
pora
y)se
quen
ces.
The
data
itse
lf co
ntai
ns in
form
atio
nth
at c
an b
e us
ed to
reco
nstru
ct o
r to
infe
r the
se a
nces
tral
rela
tions
hips
. Th
is in
volv
es re
cons
truct
ing
a br
anch
ing
stru
ctur
e, te
rmed
a p
hylo
geny
or t
ree,
that
ill
ustra
tes
the
rela
tions
hips
bet
wee
n th
e se
quen
ces.
Bas
ic tr
ee
Of L
ife
Euk
aryo
te tr
ee
Mol
ecul
ar p
hylo
gene
tics
The
stud
y of
phy
loge
nies
and
pro
cess
es o
f ev
olut
ion
by th
e an
alys
is o
f mol
ecul
ar d
ata
(DN
A o
r am
ino
acid
seq
uenc
es)
AAG
AA
TC
AAG
AG
TT
AAG
A(A
/G)T
(C/T
)
Sim
ilar s
eque
nces
, com
mon
anc
esto
r...
... c
omm
on a
nces
tor,
sim
ilar f
unct
ion
From
a c
omm
on a
nces
tors
eque
nce,
two
DN
A s
eque
nces
are
dive
rged
.
Eac
hof
thes
e tw
o se
quen
ces
star
t to
accu
mul
ate
nucl
eotid
e su
bstit
utio
ns.
The
rate
, num
ber a
nd c
hara
cter
of t
hese
m
utat
ions
are
use
d in
mol
ecul
ar e
volu
tion
anal
ysis
.
Wha
t is
a ph
ylog
enet
ictr
ee?
Phylog
enet
ictr
ee =
Den
dogr
amA
n ilu
stra
tion
of t
he h
iera
rchi
cal r
elat
ions
hips
amon
g a
grou
p of
org
anis
ms
aris
ing
thro
ugh
evol
utio
n.Th
e re
lati
onsh
ips
are
usua
lly r
epre
sent
ed b
y a
sche
mat
ic ‘t
ree’
com
pris
ing
a se
t of
nod
eslin
ked
toge
ther
by
bran
ches
. E
ach
NO
DE
repr
esen
ts a
spe
ciat
ion
even
t in
evol
utio
n. B
eyon
d th
is p
oint
any
seq
uenc
e ch
ange
s th
at o
ccur
red
are
spec
ific
for e
ach
bran
ch (s
peci
e).
In a
phy
loge
netic
tree
…..
Term
inal
nod
es(e
xter
nal n
odes
, tip
s or
leav
es)
typi
cally
rep
rese
nt t
he a
ctua
l tax
a=
know
n se
quen
ces
from
ext
ant
orga
nism
s.
Know
n al
so a
s O
TUs
(Ope
ratio
nal T
axon
omic
Uni
ts)
Inte
rnal
nod
esre
pres
ent
ance
stra
l di
verg
ence
s in
to t
wo (o
r m
ore)
gen
etic
ally
is
olat
ed g
roup
s.K
now
n al
so a
s H
TUs
(Hyp
othe
tical
Tax
onom
ic U
nits
)
Each
inte
rnal
nod
eis
att
ache
d to
a b
ranc
h(s)
repr
esen
ting
evo
luti
on f
rom
its
ance
stor
to
its
desc
enda
nts.
In a
phy
loge
netic
tree
…..
The
leng
ths
of t
he b
ranc
hes
in t
he t
ree
can
repr
esen
t th
e ev
olut
iona
ry
dist
ance
s th
at s
epar
ate
the
node
s.
(mea
ning
the
# of
cha
nges
that
occ
urre
d in
the
seqs
prio
r to
the
next
leve
l of
sepa
ratio
n).
Tree
Top
olog
y
Is th
e in
form
atio
n of
the
brun
chin
g or
der o
f re
latio
nshi
ps (b
runc
hing
pat
tern
s) b
etw
een
taxa
, with
out c
onsi
dera
tion
of th
e br
anch
le
ngth
s.
phyl
ogen
etic
tree
s
Inte
rnal
Nod
esEx
tern
al N
odes
The
Phy
loge
netic
tree…
.
Peo
ple
still
usu
ally
con
side
r a p
hylo
gene
tic
reco
nstru
ctio
npr
ogra
m to
act
like
a b
lack
box
.
It ta
kes
inpu
t, ch
urns
aro
und
for a
whi
le a
nd th
en
spits
out
the
actu
al p
hylo
gene
tic a
nsw
er.
This
is in
corr
ect.
Firs
t, th
e ac
tual
phy
loge
netic
ans
wer
can
not
be
obta
ined
by
any
know
n m
etho
d.
The
amou
nt o
f evo
lutio
nary
tim
e th
at p
asse
d fro
m th
e se
para
tion
of th
e 2
sequ
ence
s is
not
kn
own.
All
phyl
ogen
etic
anal
ysis
met
hods
can
only
pr
ovid
e es
timat
esan
d ed
ucat
ed g
uess
es
of w
hat a
phy
loge
netic
tree
mig
ht lo
ok li
ke
for t
he c
urre
nt s
et o
f dat
a.
Thes
e m
etho
ds c
an o
nly
estim
ate
the
# of
cha
nges
that
oc
curr
ed fr
om th
e tim
e of
sep
arat
ion
(bru
nchi
ng e
vent
).
Thes
e es
timat
es a
re o
nly
as g
ood
as th
e da
ta it
self
and
only
as
good
as
the
algo
rithm
. Som
e al
gorit
hms
in c
omm
on u
se
are
actu
ally
qui
te p
oor m
etho
ds.
Mod
ellin
gE
volu
tion
In m
olec
ular
phy
loge
netic
sw
e try
to u
nder
stan
d se
quen
ce e
volu
tion.
To in
ferr
obus
tphy
loge
netic
sfo
r the
dat
a w
e ne
ed p
ower
ful (
stat
istic
al) E
volu
tiona
ry M
odel
s.
All
met
hods
desc
ribin
g se
quen
ce e
volu
tion
use
a m
odel
that
con
sist
of 2
com
pone
nts:
phyl
ogen
tictre
eA
Mod
el (d
escr
iptio
n of
the
way
the
nule
otid
esan
d aa
seqs
evol
ved)
.
Mod
ellin
gE
volu
tion
Mod
els
of m
olec
ular
evo
lutio
n ar
e hi
ghly
sim
plifi
ed
desc
riptio
ns o
f the
his
tory
and
pro
cess
of s
eque
nce
chan
ge.
(A ->
C, G
-> T
, A
-> T
, S ->
T)
The
hist
ory
is re
pres
ente
d by
the
phyl
ogen
etic
tree
topo
logy
and
the
expe
cted
am
ount
of e
volu
tion
on
each
bra
nch
(i.e.
the
bran
ch le
ngth
s)M
ost
effo
rt f
or m
odel
ing
sequ
ence
evo
luti
on h
as
been
con
cent
rate
d on
the
pro
cess
es o
f nu
cleo
tide
sub
stit
utio
n an
d am
ino-
acid
re
plac
emen
t.
Mod
ellin
gE
volu
tion
As
inse
rtio
n an
d de
leti
onev
ents
hav
e pr
oven
di
ffic
ult
to m
odel
. Con
vent
iona
l met
hods
for
phyl
ogen
y re
cons
truc
tion
req
uire
a k
nown
se
quen
ce a
lignm
ent
and
negl
ect
alig
nmen
t un
cert
aint
y.
Alig
nmen
t co
lum
ns w
ith
gaps
are
eith
er
rem
oved
fro
m t
he a
naly
sis
or a
re t
reat
ed in
an
ad h
oc f
ashi
on. A
s a
resu
lt, e
volu
tion
ary
info
rmat
ion
from
inse
rtio
ns a
nd d
elet
ions
is
typi
cally
igno
red
duri
ng p
hylo
geny
re
cons
truc
tion
.
Mod
ellin
gE
volu
tion
The
diffi
culty
with
inse
rtion
s an
d de
letio
ns
may
be
the
mos
t vex
ing
prob
lem
for
mod
el-b
ased
app
roac
hes
to s
tudy
ing
sequ
ence
evo
lutio
n.M
odel
s di
ffer b
y th
eir a
ssum
ptio
ns
rega
rdin
g th
e ra
tes
of s
ubst
itutio
ns
occu
rrenc
e of
all
poss
ible
repl
acem
ents
.
Mod
ellin
gE
volu
tion
Ther
e ar
e 2
know
n ap
proa
ches
for
mat
hem
atic
al m
odel
s of
seq
uenc
e ev
olut
ion
whi
ch in
clud
e va
riabl
es th
at re
pres
ent
info
rmat
ion
(feat
ures
) of t
he p
roce
ss o
f ev
olut
ion:
Em
piric
al m
odel
s–
use
fixed
par
amet
ers
valu
es (e
stim
ated
from
pre
viou
s kn
owle
dge
of
data
sets
, and
pre
com
pute
d)P
aram
etric
mod
els
–do
not
use
fixe
d pa
ram
eter
s va
lues
. The
y al
low
the
valu
es to
be
deriv
ed fr
om th
e cu
rren
t dat
aset
.
Mod
ellin
gE
volu
tion
Ther
e ar
e di
ffere
nt ty
pes
of m
odel
s:D
NA
Sub
stitu
tion
mod
els
with
3 ty
pes
of
para
met
ers
used
by
them
:B
ase
frequ
ency
para
met
ers
Bas
e ex
chan
geab
ility
para
met
ers
Rat
e he
troge
neity
para
mat
eres
(ass
ign
diffe
rent
var
iatio
n ra
tes
to d
iffer
ent p
arts
of
the
sequ
ence
, e.g
diff
eren
t rat
esto
each
pos
ition
in
the
codo
n, o
r to
site
s in
a p
rote
in d
omai
n)
DN
A M
odel
sA
min
o ac
ids
Mod
el
Mut
atio
ns in
DN
A a
s a
sour
ce fo
r ev
olut
iona
ry a
naly
sis
Onl
ym
utat
ions
that
wer
e fix
ed in
the
popu
latio
n ar
e ca
lled
subs
titut
ions
.
We
assu
me
that
eac
h ob
serv
ed c
hang
e in
si
mila
r seq
uenc
es, r
epre
sent
a “s
ingl
e m
utat
ion
even
t”.
The
grea
ter t
henu
mbe
r of c
hang
es, t
he
mor
e po
ssib
lety
pes
of m
utat
ions
.
Orig
inal
Seq
uenc
eA C T G A A C G T A
A C T G A >
C > T
A C > G
G T > A
A C > A
T G A
A C > A
G T > A
Sing
le S
ubst
itut
ion
Mul
tipl
eSu
bsti
tuti
onco
inci
dent
al S
ubst
itut
ion
Para
llel S
ubst
itut
ion
DN
A S
ubst
itutio
n M
utat
ions
Tran
sitio
n-a
cha
nge
betw
een
purin
es(A
,G) o
r bet
wee
n py
rimid
ines
(T,C
).
Tran
sver
sion
-a c
hang
e be
twee
n pu
rines
(A,G
) to
pyr
imid
ines
(T,C
).
Sub
stitu
tion
mut
atio
ns u
sual
ly a
rise
from
m
ispa
iring
of b
ases
dur
ing
repl
icat
ion.
Cor
rect
ion
for l
ikel
ihoo
d of
mut
atio
ns
in D
NA
seq
uenc
es
Ther
e ar
e se
vera
levo
lutio
nary
mod
els
used
forc
orre
ctio
n fo
r the
like
lihoo
d of
m
ultip
le m
utat
ions
and
reve
rsio
ns in
DN
A
sequ
ence
s.Th
ese
evol
utio
nary
mod
els
use
ano
rmal
ized
dis
tanc
e m
easu
rem
ent t
hat i
sth
e av
erag
e de
gree
of c
hang
e pe
r len
gth
of a
ligne
d se
quen
ces.
Juke
s &
Can
tor o
ne-p
aram
eter
mod
el
This
mod
el a
ssum
es th
at s
ubst
itutio
nsbe
twee
n th
e 4
base
s oc
cur w
ith e
qual
freq
uenc
y.
Mea
ning
no
bias
in th
e di
rect
ion
of th
e ch
ange
.
AG
α
T
Is th
e ra
te o
f sub
stitu
tions
In e
ach
of th
e 3
dire
ctio
nsFo
r one
bas
e.
α
( Is
the
one
para
met
er).
α
C
Kim
ura
two-
para
met
er m
odel
This
mod
el a
ssum
es th
at tr
ansi
tions
(A
-G
or T
-C
) occ
ur m
ore
ofte
n th
an
trans
vers
ions
(pur
ine
-pyr
imid
ine)
.
AG
α
T
Is th
e ra
te o
f tra
nsiti
onal
S
ubst
itutio
ns.
α
Cα
ββ
βIs
the
rate
of t
rans
vers
iona
lsu
bstit
utio
ns.
Thes
e ev
olut
iona
ry m
odel
s im
prov
e th
e di
stan
ce c
alcu
latio
ns b
etw
een
the
sequ
ence
s.
Thes
e ev
olut
iona
ry m
odel
s ha
ve le
ss
effe
ct in
phy
loge
netic
pred
ictio
ns o
f cl
osel
y re
late
d se
quen
ces.
Th
ese
evol
utio
nary
mod
els
have
bet
ter
effe
ct w
ith d
ista
nt re
late
d se
quen
ces.
Mod
ellin
gE
volu
tion
Am
ino
acid
s re
plac
emen
ts m
odel
sA
min
o-ac
ids
frequ
ency
and
exc
hang
eabi
lity
mod
els
-e.
g D
ayho
ffm
odel
,ver
y ol
d an
d si
mpl
e, b
ased
on
coun
ting
the
obse
rved
repl
acem
ents
in v
ery
sim
ilar
prot
eins
and
usi
ng th
ese
info
for t
he p
aram
eter
s.
Day
hoff
mod
elas
sum
es th
at:
all s
ites
in a
pro
tein
evo
lve
inde
pend
ently
of o
ne a
noth
er.
Thei
r sim
ple
mod
el a
ssum
es th
at A
t eac
h si
te, t
he
proc
ess
of a
min
o-ac
id re
plac
emen
t is
defin
ed b
y a
mat
rix
of re
plac
emen
t rat
es.
Mod
ellin
gE
volu
tion
For e
ach
poss
ible
cha
nge
from
one
of t
he tw
enty
am
ino
acid
type
s to
ano
ther
, the
re is
a
corr
espo
ndin
g m
atrix
ent
ry. W
ith th
e D
ayho
ffm
odel
, all
site
s ev
olve
acc
ordi
ng to
the
sam
e ra
te m
atrix
.JT
T m
odel
, an
upda
ted
vers
ion
of th
e D
ayho
ffm
odel
Yan
g m
odel
, a m
ore
adva
nced
and
cle
ver,a
llow
ing
diffe
rent
site
s in
the
sequ
ence
to e
volv
e at
di
ffere
nt ra
te. T
his
mod
el a
ssum
e th
at, e
xcep
t for
ra
te h
eter
ogen
eity
, all
site
s ev
olve
acc
ordi
ng to
th
e sa
me
proc
ess.
Mod
ellin
gE
volu
tion
A li
mita
tion
of a
ll th
e af
orem
entio
ned
mod
els
of
amin
o ac
id re
plac
emen
tis
that
they
are
bas
ed
on c
hang
es a
mon
g 20
sta
tes,
whe
re e
ach
stat
e re
pres
ents
an
amin
o ac
id ty
pe.
In re
ality
, evo
lutio
n oc
curs
at t
he le
vel o
f DN
A
sequ
ence
s.Th
eref
ore,
it is
pre
fera
ble
to fr
ame
mod
els
of
sequ
ence
evo
lutio
n in
term
s of
cod
ons
rath
er
than
in te
rms
of a
min
o ac
ids.
To d
ate,
mos
t cod
on-b
ased
mod
els
have
em
ploy
ed th
e as
sum
ptio
n th
at a
ll ch
ange
s to
a c
odon
invo
lve
only
one
of
the
thre
e co
don
posi
tions
.
Mod
ellin
gE
volu
tion
In th
e ev
olut
ion
liter
atur
e, c
hang
es to
a c
odon
are
term
ed e
ither
syn
onym
ous
beca
use
they
do
not
alte
rthe
am
ino
acid
spe
cifie
d by
the
codo
n.
Oth
er c
hang
es a
re te
rmed
non
syno
nym
ous
beca
use
they
do
resu
lt in
a c
hang
eof
the
amin
o ac
id b
eing
spe
cifie
d by
the
codo
n.
A A
G -
> Ly
sA
A A
->Ly
sA
A A
-> L
ysA
G A
-> A
rg
The
Mar
kov
Pro
cess
Mod
elIs
a m
athe
mat
ical
mod
elof
infr
eque
nt
chan
ges
of (d
iscr
ete)
sta
tes
over
tim
e, in
wh
ich
futu
re e
vent
s oc
cur
by c
hanc
e an
d de
pend
onl
y on
the
cur
rent
sta
te, a
nd n
ot o
n th
e hi
stor
y of
how
tha
t st
ate
was
reac
hed.
In
mol
ecul
ar p
hylo
gene
tics
, the
sta
tes
of t
he
proc
ess
are
the
poss
ible
nuc
leot
ides
or
amin
o ac
ids
pres
ent
at a
giv
en t
ime
and
posi
tion
in a
se
quen
ce. A
sta
te c
hang
esin
thi
s ca
se
repr
esen
t m
utat
ions
in s
eque
nces
.
Rat
e he
tero
gene
ity a
nd th
e G
amm
a D
istru
butio
nm
odel
From
peo
ple
work
(pub
licat
ions
), it
is k
nown
tha
t:
Mut
atio
n ra
tes
vary
con
side
rabl
y am
ongs
t si
tes
of D
NA
and
am
ino
acid
seq
uenc
es,
beca
use
of b
ioch
emic
al f
acto
rs, c
onst
rain
ts o
f th
e ge
neti
c co
de, s
elec
tion
for
gen
e fu
ncti
on,
etc.
Th
is v
aria
tion
is o
ften
mod
eled
usi
ng a
gam
ma
dist
ribu
tion
of
rate
s ac
ross
seq
uenc
e si
tes.
The
shap
e of
the
gam
ma
dist
ribu
tion
is
cont
rolle
d by
a p
aram
eter
a, a
nd t
he
dist
ribu
tion
’s m
ean
and
vari
ance
are
1 a
nd 1
/a,
resp
ecti
vely
.
Rat
e he
tero
gene
ity a
nd th
e G
amm
a D
istru
butio
nm
odel
Larg
e va
lues
of
a (p
arti
cula
rly
a >1
) giv
e a
bell
curv
e-sh
aped
dist
ribu
tion
, su
gges
ting
litt
le o
r no
rat
e he
tero
gene
ity
Smal
l val
ues
of a
give
a
reve
rse
–J-
shap
eddi
stri
buti
on, s
ugge
stin
g hi
gher
leve
ls o
f ra
te
hete
roge
neit
yal
ong
with
m
any
site
s wi
th lo
w ra
tes
of e
volu
tion
.
The
mol
ecul
ar c
lock
hyp
othe
sis
assu
mes
that
…H
omol
ogou
s se
quen
ces
of D
NA
evo
lve
at a
co
nsta
nt a
nd in
varia
ble
rate
acr
oss
all t
axa.
(s
ame
rate
alo
ng a
ll tre
e br
unch
es)
The
rate
of t
he m
utat
ions
is th
e sa
me
for a
ll po
sitio
ns a
long
the
sequ
ence
.If
true,
this
hyp
othe
sis
wou
ld e
nabl
e to
de
term
ine
mor
e ac
cura
tely
div
erge
nce
times
an
d ph
ylog
enet
icre
latio
nshi
ps.
How
ever
, pub
licat
ions
sho
w v
aria
ble
rate
s of
mut
atio
n (e
volu
tion)
, whi
ch q
uest
ion
the
valid
ity o
f thi
s th
eory
.Th
eref
ore,
the
mol
ecul
ar c
lock
hyp
othe
sis
is m
ost s
uita
ble
for c
lose
ly re
late
d sp
ecie
s.
Roo
ted
Tree
= C
lado
gram
A p
hylo
gene
tictre
e th
at
all t
he "o
bjec
ts" o
n it
shar
e a
know
n co
mm
on
ance
stor
(the
root
). Th
ere
exis
ts a
par
ticul
ar
root
nod
e.A
root
is a
taxo
n(s
eq)
that
bra
nche
d ea
rlier
of
all t
he o
ther
taxa
on th
e tre
e, b
ut is
rela
ted
to
them
.
A
B
C
Roo
t
The
path
s fro
m th
e ro
ot to
the
node
s co
rres
pond
to
evol
utio
nary
tim
e
Unr
oote
dTr
ee =
Phe
nogr
am
A p
hylo
gene
tictre
e w
here
all
the
"obj
ects
" on
it ar
e re
late
d de
scen
dant
s -b
ut th
ere
is
not e
noug
h in
form
atio
n to
sp
ecify
the
com
mon
an
cest
or (r
oot).
The
path
bet
wee
n no
des
of
the
tree
do n
otsp
ecify
an
evol
utio
nary
tim
e.
B
Roo
ted
vers
us U
nroo
ted
The
num
ber o
f tre
e to
polo
gies
of r
oote
d tre
e is
muc
h hi
gher
than
that
of t
he
unro
oted
tree
for t
he s
ame
num
ber o
f O
TUs.
Ther
efor
e, th
e er
ror o
f the
unr
oote
dtre
e to
polo
gy is
sm
alle
r tha
n th
at o
f the
root
ed
tree.
Orth
olog
s-g
enes
rela
ted
by s
peci
atio
n ev
ents
. Mea
ning
sam
e ge
nes
in d
iffer
ent
spec
ies.
Par
alog
s-g
enes
rela
ted
by d
uplic
atio
n ev
ents
. Mea
ning
dup
licat
ed g
enes
in th
e sa
me
spec
ies.
Sel
ectin
g th
e da
tase
t (se
quen
ces)
fo
r phy
loge
netic
anal
ysis
W
e us
e bo
th ty
pes
of s
eque
nces
for r
estru
ctio
nph
ylog
enet
ictre
es: P
rote
inan
d D
NA
Fo
r DN
Ase
quen
ces,
the
rate
of m
utat
ion
is
assu
med
to b
e th
e sa
me
in b
oth
codi
ng a
nd n
on-
codi
ng re
gion
s.
H
owev
er, t
here
is a
diff
eren
ce in
the
subs
titut
ion
rate
bet
wee
n co
ding
and
non
-cod
ing
regi
ons.
N
on-c
odin
g D
NA
regi
ons
are
know
n to
hav
e m
ore
subs
titut
ions
than
cod
ing
DN
A re
gion
s.
Sel
ectin
g th
e da
tase
t (se
quen
ces)
fo
r phy
loge
netic
anal
ysis
For P
rote
ins,
the
rate
of m
utat
ion
is v
ery
low
in
the
cons
erve
d re
gion
,or “
func
tiona
l reg
ions
”as
we
rela
te to
them
.
Mos
t evo
lutio
n al
gorit
hm c
an b
ette
r ana
lyze
re
gion
s th
at m
utat
e sl
owly
, sm
all n
umbe
r of
chan
ges
in th
e m
ultip
le a
lignm
ents
.R
egio
ns th
at h
ave
“hig
h nu
mbe
r of c
hang
es”
need
spe
cial
alg
orith
m to
dea
l with
them
su
cces
sful
ly.
Sel
ectin
g th
e da
tase
t (se
quen
ces)
fo
r phy
loge
netic
anal
ysis
‡S
eque
nces
that
are
bei
ng u
sed
as th
e da
tase
t be
long
toge
ther
(orth
olog
s).
‡If
no a
nces
tral s
eque
nce
is a
vaila
ble
you
may
us
e an
"out
grou
p" a
s a
refe
renc
e to
mea
sure
di
stan
ces.
In s
uch
a ca
se, f
or a
n ou
tgro
upyo
u ne
ed to
cho
ose
a cl
ose
rela
tive
to th
e gr
oup
bein
g co
mpa
red.
‡
For e
xam
ple:
if th
e gr
oup
is o
f mam
mal
ian
sequ
ence
s th
en th
e ou
tgro
upsh
ould
be
a se
quen
ce
from
bird
s an
d no
t pla
nts.
Kno
wn
Pro
blem
s of
Mul
tiple
Alig
nmen
ts
Goo
d an
alys
is is
bas
ed o
n go
od
alig
nmen
ts.
Che
ck th
e al
ignm
ent t
o se
e th
at Im
porta
nt
Site
sar
e no
t mis
alig
ned
by th
e so
ftwar
e us
ed fo
r the
seq
uenc
e al
ignm
ent.
Mis
alig
nmen
t can
effe
ct th
e si
gnifi
canc
e of
th
e si
te-a
nd th
e tre
e.Fo
r exa
mpl
e: A
TG a
s st
art c
odon
, or s
peci
fic
amin
o ac
ids
in fu
nctio
nal d
omai
ns.
Kno
wn
Pro
blem
s of
Mul
tiple
Alig
nmen
ts
Gap
s (in
dels
) in
mul
tiple
alig
nmen
t are
us
ually
nor
sco
red,
(or p
lain
ly ig
nore
d) b
y m
ost p
rogr
ams.
Gap
s ar
e no
t sco
red
sinc
e th
ere
is n
o su
itabl
e m
odel
of e
volu
tion
mec
hani
sm th
at
prod
uces
them
.
Kno
wn
Pro
blem
s of
Mul
tiple
Alig
nmen
ts
T Y R R S R ACA TAC AGG CGA
T Y R R
T Y R R S R ACA TAC AGG CGA
T Y R R
T Y R -
S R ACA TAC AGG ---
T Y R -
T Y R -
S R ACA TAC ---
CGA
T Y -
R
T Y R R S R ACA TAC AGG CGA
T Y R R
Alig
nmen
t of P
rote
ins
that
con
tain
s ga
ps, s
houl
d be
co
mpa
red
with
the
alig
nmen
t of t
heir
DN
A c
odin
g re
gion
s..
The
reas
on is
to to
be s
ure
abou
t the
pla
cem
ent o
f gap
s,si
nce
the
dege
nera
cy o
f the
cod
.
Kno
wn
Pro
blem
s of
Mul
tiple
Alig
nmen
ts
♥Lo
w c
ompl
exity
regi
ons
-effe
ct th
e m
ultip
le
alig
nmen
t bec
ause
they
cre
ate
rand
om b
ias
for
vario
us re
gion
s of
the
alig
nmen
t.♥
Low
com
plex
ity re
gion
ssh
ould
be
rem
oved
from
th
e al
ignm
ent b
efor
e bu
ildin
g th
e tre
e.♥
Or i
f tak
en in
to th
e da
tase
t, ne
ed s
peci
al m
odel
s of
bia
sed
regi
ons
If yo
u de
lete
thes
e re
gion
s yo
u ne
ed to
con
side
r th
e af
fect
of t
he d
elet
ions
on
the
bran
ch le
ngth
s of
th
e w
hole
tree
.
How
to c
hoos
e th
e be
st m
etho
d?
Cho
ose
set o
f rel
ated
seqs
(DN
A o
r Pro
tein
sO
btai
n M
ultip
leA
lignm
ent
Is th
ere
a st
rong
sim
ilarit
y?
Stro
ng si
mila
rity
Max
imum
Par
sim
ony
Dis
tant
(wea
k) si
mila
rity
Dis
tanc
e m
etho
ds
Ver
y w
eak
sim
ilarit
yM
axim
um L
ikel
ihoo
dC
heck
val
idity
of t
he
resu
lts
Take
n fro
m D
r.Ita
iYan
ai
A -
GCTTGTCCGTTACGAT
B –
ACTTGTCTGTTACGAT
C –
ACTTGTCCGAAACGAT
D -
ACTTGACCGTTTCCTT
E –
AGATGACCGTTTCGAT
F -
ACTACACCCTTATGAG
Giv
en a
mul
tiple
alig
nmen
t, ho
w d
o w
e co
nstr
uct t
he tr
ee?
?
The
best
met
hod
to b
uild
a
phyl
ogen
etic
tree
Will
ext
ract
the
max
imum
am
ount
of i
nfor
mat
ion
avai
labl
e fro
m th
e se
quen
ces
data
. It
will
com
bine
this
info
rmat
ion
with
prio
r kn
owle
dge
of p
atte
rns
of s
eque
nces
evo
lutio
n(e
volu
tion
mod
els)
,an
d w
ill a
dd m
odel
par
amet
ers
(suc
h as
tra
nsiti
on/tr
ansv
ersi
onbi
as k
) who
se v
alue
s ar
e no
t kno
wn
a pr
iori.
Bui
ldin
g P
hylo
gene
ticTr
ees
Mai
n m
etho
ds:
Dis
tanc
es m
atrix
met
hods
N
eigh
bour
Join
ing,
UP
GM
A
Cha
ract
er b
ased
met
hods
:P
arsi
mon
y m
etho
dsM
axim
um L
ikel
ihoo
d m
etho
dV
alid
atio
n m
etho
d:B
oots
trapp
ing
Jack
Kni
fe
Cha
ract
er B
ased
Met
hods
All
Cha
ract
er B
ased
Met
hods
assu
me
that
ea
ch c
hara
cter
sub
stitu
tion
is in
depe
nden
t of
its
neig
hbor
s.
Max
imum
Par
sim
ony
(min
imum
evo
lutio
n)
-in
this
met
hod
one
tree
will
be g
iven
(b
uilt)
with
the
few
est c
hang
es re
quire
d to
ex
plai
n th
e di
ffere
nces
obs
erve
din
the
data
, (tre
e) .
Max
imum
Par
sim
ony
Met
hods
In o
ther
wor
ds, M
axim
um P
arsi
mon
y M
etho
d w
ill
find
the
tree
that
chan
ges
any
sequ
ence
(tax
a)
on th
e tre
e in
to a
ll ot
her s
eque
nces
on
the
tree.
If th
e nu
mbe
r of c
hang
es p
er s
eque
nce
posi
tion
is re
lativ
ely
smal
l, th
en m
axim
um p
arsi
mon
y ap
prox
imat
es M
L an
d its
est
imat
es o
f tre
e to
polo
gy w
ill b
e si
mila
r to
thos
e of
M
L es
timat
ion.
A
s m
ore-
dive
rgen
t seq
uenc
es a
re a
naly
sed,
the
degr
ee o
f hom
opla
sy(i.
e. p
aral
lel,
conv
erge
nt,
reve
rsed
or s
uper
impo
sed
chan
ges)
incr
ease
s.Th
e tru
e ev
olut
iona
ry tr
ee b
ecom
es le
ss li
kely
to
be th
e on
e w
ith th
e le
ast n
umbe
r of c
hang
es,
and
pars
imon
y m
etho
ds fa
il.
Cha
ract
er B
ased
Met
hods
Q:H
ow d
o yo
u fin
d th
e m
inim
um #
of c
hang
es
need
ed to
exp
lain
the
data
in a
giv
en tr
ee?
A:T
he a
nsw
er w
ill b
e to
con
stru
ct a
set
of p
ossi
ble
way
s to
get
from
one
set
to th
e ot
her,
and
choo
se
the
"bes
t". ( f
or e
xam
ple:
Max
imum
Par
sim
ony)
CCGCCACGA
P P R
CGGCCACGA
RP R
Max
imum
Par
sim
ony
Met
hods
Furth
erm
ore,
whe
n th
e tru
e tre
e ha
s sh
ort i
nter
nal
bran
ches
and
long
term
inal
bra
nche
s, a
ph
enom
enon
can
occ
ur w
here
by th
e lo
ng
bran
ches
app
ear t
o at
tract
one
ano
ther
and
can
be
err
oneo
usly
infe
rred
to b
e to
o cl
osel
y re
late
d.
Com
bina
tions
of c
ondi
tions
whe
n th
is o
ccur
s ar
e of
ten
calle
d th
e Fe
lsen
stei
nzo
ne, a
ndpa
rsim
ony
is p
artic
ular
ly a
ffect
ed b
y th
is p
robl
em
beca
use
of it
s in
abili
ty to
dea
l with
hom
opla
sy.
In th
e Fe
lsen
stei
nzo
ne, p
arsi
mon
y be
com
es
incr
easi
ngly
cer
tain
of t
he w
rong
tree
; a p
rope
rty
refe
rred
to a
s in
cons
iste
ncy.
Max
imum
Par
sim
ony
Met
hods
In a
dditi
on, p
arsi
mon
y la
cks
an e
xplic
it m
odel
of e
volu
tion.
A
lthou
gh o
rigin
ally
see
n by
som
e as
a
stre
ngth
of p
arsi
mon
y, th
is h
as n
ow
beco
me
a lim
iting
fact
or p
reve
ntin
g fru
itful
fe
ed-fo
rwar
d be
twee
n ph
ylog
enet
ican
alys
is a
nd in
vest
igat
ion
of th
e bi
olog
ical
im
plic
atio
ns o
f diff
eren
t mod
els
of
evol
utio
n.
123456789012345678901
Mouse CTTCGTTGGATCAGTTTGATA
Rat CCTCGTTGGATCATTTTGATA
Dog CTGCTTTGGATCAGTTTGAAC
Human CCGCCTTGGATCAGTTTGAAC
------------------------------------
Invariant * * ******** *****
Variant ** * * **
------------------------------------
Informative ** **
Non-inform. * *
Star
t by
clas
sify
ing
the
site
s:
Max
imu
m P
arsi
mon
y
Take
n fro
mD
r. Ita
iYan
ai
Wha
t’s in
a S
ITE
Varia
ble
site
Con
tain
s at
leas
t tw
o ty
pes
of n
ucle
otid
es o
r am
ino
acid
s. S
ome
varia
ble
site
s ca
n be
sin
glet
on o
r pa
rsim
ony-
info
rmat
ive.
Si
ngle
ton
Site
sA
site
is c
alle
d a
sing
leto
n si
te if
it c
onta
ins
at le
ast
two
type
s of
nuc
leot
ides
(or a
min
o ac
ids)
with
at
mos
t one
of t
hem
occ
urrin
g m
ultip
le ti
mes
. C
onst
ant S
iteIf
a si
te c
onta
ins
the
sam
e nu
cleo
tide
or a
min
o ac
id
in a
ll se
quen
ces,
it is
refe
rred
to a
s a
cons
tant
site
123456789012345678901
Mouse CTTCGTTGGATCAGTTTGATA
Rat CCTCGTTGGATCATTTTGATA
Dog CTGCTTTGGATCAGTTTGAAC
Human CCGCCTTGGATCAGTTTGAAC
** *
Mou
se
Rat
Dog
Huma
n
Mou
se
Rat
Dog
Huma
n
Mou
seRa
t
Dog
Huma
n
Mou
se
Rat
Dog
Huma
n
Mou
se
Rat
Dog
Huma
n
Mou
seRa
t
Dog
Huma
nM
ouse
Rat
Dog
Huma
n
Mou
se
Rat
Dog
Huma
n
Mou
seRa
t
Dog
Huma
n
Site
5:G G T C T T
T C T C G G
T G T C G T
G C T C T G
G T T T T G
G C C C T G
GG
CC
GG
GG
CT
GG
TG
CC
GT
Site
2:
Site
3:
Take
n fro
mD
r. Ita
iYan
ai
Max
imum
Par
sim
ony
Met
hods
The
poss
ible
opt
imal
tree
is b
uilt
by
addi
ng th
e nu
mbe
r of c
hang
es a
t eac
h in
form
ativ
e si
te fo
r eac
h tre
e.Th
e tre
e th
at re
quire
s th
e le
ast n
umbe
r of
cha
nges
is c
hose
n.
Max
imum
Par
sim
ony
Met
hods
∞Th
e M
axim
um P
arsi
mon
y m
etho
d is
goo
d fo
r si
mila
r seq
uenc
es, a
seq
uenc
es g
roup
with
sm
all a
mou
nt o
f var
iatio
ns
Max
imum
Par
sim
ony
met
hods
do
not g
ive
the
bran
ch le
ngth
s on
ly th
e br
anch
ord
er.
For l
arge
r set
it is
reco
mm
ende
d to
use
the
“bra
nch
and
boun
d”m
etho
d in
stea
dof
Max
imum
Par
sim
ony.
Max
imum
Par
sim
ony
Met
hods
are
A
vaila
ble…
For D
NA
in P
rogr
ams:
paup
, mol
phy,
phyl
o_w
inIn
the
Phy
lippa
ckag
e:D
NA
Par
s, D
NA
Pen
ny, e
tc..
For P
rote
in in
Pro
gram
s:pa
up, m
olph
y,ph
ylo_
win
In th
e P
hylip
pack
age:
PR
OTP
ars
Cha
ract
er B
ased
Met
hods
:M
axim
um L
ikel
ihoo
dA
HY
PO
THE
SIS
is a
mod
el o
f evo
lutio
n to
ex
plai
n th
e da
ta, L
IKE
LIH
OO
D w
ill ch
oose
th
e hy
poth
esis
that
fits
the
data
bes
t.
Like
lihoo
d m
etho
dsre
gard
the
obse
rved
da
taas
a fi
xed
obse
rvat
ion
and
seek
the
valu
es o
f the
sta
tistic
al p
aram
eter
s th
at
prov
ide
the
mos
t pro
babl
e de
scrip
tion
of
thos
e da
ta, g
iven
the
mod
el o
f evo
lutio
n.
Cha
ract
er B
ased
Met
hods
:M
axim
um L
ikel
ihoo
d
The
likel
ihoo
d do
es n
ot d
escr
ibe
eith
er th
e pr
obab
ility
that
the
even
ts u
nder
stu
dy h
appe
ned
(they
did
hap
pen)
, or t
hat t
he m
odel
is tr
ue.
Like
lihoo
d de
scrib
es th
e lik
elih
ood
that
a g
iven
pr
oces
s (m
odel
) as
oppo
sed
to s
ome
othe
r pr
oces
ses
is re
spon
sibl
e fo
r the
obs
erve
d da
ta.
Thes
e pr
oper
ties
mak
e lik
elih
ood
parti
cula
rysu
ited
to h
isto
rical
infe
renc
e pr
oble
ms,
in w
hich
th
e ob
serv
ed d
ata
aris
e on
ly o
nce.
Cha
ract
er B
ased
Met
hods
:M
axim
um L
ikel
ihoo
dM
axim
um L
ikel
ihoo
d m
etho
d–
(like
the
Max
imum
Par
sim
ony
met
hod)
per
form
s its
an
alys
is o
n ea
ch p
ositi
on o
f the
mul
tiple
al
ignm
ent.
This
is w
hy th
is m
etho
d is
ver
y he
avy
on C
PU
. S
tarts
with
a s
et o
f seq
uenc
es, a
nd fi
nds
estim
ates
for t
he v
aria
bilit
y in
eac
h po
sitio
n.
It ch
ecks
the
rate
s of
tran
sitio
ns a
nd tr
ansv
ersi
ons
(with
bot
h m
odel
s K
imur
a an
d Ju
kes
& C
anto
r).
At t
he e
nd, a
fter a
ll po
sitio
ns in
the
sequ
ence
s al
ignm
ent w
ere
chec
ked,
the
likel
ihoo
d of
the
who
le tr
ee is
repo
rted.
Max
imum
Lik
elih
ood
met
hod
Can
be
foun
d in
the
follo
win
g pa
ckag
es:
phyl
ip, p
aup,
meg
a or
tree
-puz
zle.
Dis
tanc
es (p
airw
ise)
Met
hods
Dis
tanc
e-t
he n
umbe
r of s
ubst
itutio
ns p
er
site
per
tim
e pe
riod.
Evo
lutio
nary
dis
tanc
ear
e ca
lcul
ated
bas
ed
on o
ne o
f DN
A e
volu
tiona
ry m
odel
s.D
ista
nce
met
hods
use
the
sam
e m
odel
s of
ev
olut
ion
as M
Lto
est
imat
e th
e ev
olut
iona
ry
dist
ance
bet
wee
n ea
ch p
air o
f seq
uenc
es
from
the
set u
nder
ana
lysi
s.Th
en fi
ts a
phy
loge
netic
tree
to th
ose
dist
ance
s.
Dis
tanc
es M
etho
ds
The
estim
ated
dis
tanc
esw
ill u
sual
ly b
e M
Les
timat
es fo
r eac
h pa
ir (c
onsi
dere
d in
depe
nden
tly o
f the
oth
er s
eque
nces
), bu
t th
e se
t of a
ll pa
irwis
edi
stan
ces
will
notb
e co
mpa
tible
with
any
tree
.S
o, a
bes
t-fitt
ing
phyl
ogen
y is
der
ived
usi
ng
non-
ML
met
hods
.
Dis
tanc
es M
etho
ds
of d
ista
nce
met
hods
incl
ude
Dis
adva
ntag
es
the
inev
itabl
e lo
ss o
f evo
lutio
nary
in
form
atio
n w
hen
a se
quen
ce a
lignm
ent i
s co
nver
ted
to p
airw
ise
dist
ance
s, a
nd th
e in
abilit
y to
dea
l with
mod
els
cont
aini
ng
para
met
ers
for w
hich
the
valu
es a
re n
ot
know
n a
prio
ri (e
.g. ê
abo
ve).
Dis
tanc
es M
etho
ds:
Nei
ghbo
r-Joi
ning
Nei
ghbo
rs –
pair
of s
eque
nces
, in
a se
quen
ce
set,
that
hav
e th
e sm
alle
st n
umbe
r of c
hang
es
(sub
stitu
tions
) bet
wee
n th
em.
On
a ph
ylog
enet
ictre
e,ne
ighb
ors
are
join
ed b
y a
bran
chto
the
sam
e no
de(c
omm
on a
nces
tor)
.
Som
e of
the
Dis
tanc
es m
etho
ds, s
uch
as th
eN
eigh
bor-
Join
ing
UP
GM
A,
use
the
mol
ecul
ar
cloc
k hy
poth
esis
.M
ost o
f the
oth
er D
ista
nces
prog
ram
s do
not
.
Com
mon
ste
ps to
bui
ld a
TR
EE
use
d by
Dis
tanc
es m
etho
d
1M
ultip
le a
lignm
ents
-ba
sed
on a
ll ag
ains
t al
l pai
rwis
eco
mpa
rison
s.2
Bui
ldin
g di
stan
ce m
atrix
of a
ll th
e co
mpa
red
sequ
ence
s (a
ll pa
ir of
OTU
s).
3D
isre
gard
of t
he a
ctua
l seq
uenc
es.
4C
onst
ruct
ing
a gu
ide
tree
by c
lust
erin
g th
e di
stan
ces.
Iter
ativ
ely
build
the
rela
tions
(b
ranc
hes
and
inte
rnal
nod
es) b
etw
een
all
OTU
s.
Dis
tanc
e m
etho
d st
eps
Con
stru
ctio
n of
a d
ista
nce
tree
usi
ng c
lust
erin
g w
ith th
e U
nwei
ghte
dPa
ir G
roup
Met
hod
with
Ari
thm
atic
Mea
n (U
PGM
A)
AB
CD
EB
2C
44
D6
66
E6
66
4F
88
88
8
From
htt
p://w
ww
.icp.
ucl.a
c.be
/~op
perd
/pri
vate
/upg
ma.
htm
l
A -GCTTGTCCGTTACGAT
B –ACTTGTCTGTTACGAT
C –ACTTGTCCGAAACGAT
D -ACTTGACCGTTTCCTT
E –AGATGACCGTTTCGAT
F -ACTACACCCTTATGAG
Firs
t, co
nstru
ct a
dis
tanc
e m
atrix
:
Dis
tanc
es M
atrix
Met
hods
Can
be
foun
d in
the
follo
win
g P
rogr
ams:
C
lust
alw
, Phy
lo_w
in, P
aup
In
the
GC
G s
oftw
are
pack
age:
P
aups
earc
h, d
ista
nces
In
the
Phy
lippa
ckag
e:
DN
AD
ist,
PR
OTD
ist,
Fitc
h, K
itch,
Nei
ghbo
r
Sta
tistic
al te
sts
for T
ree
Topo
logy
In m
any
appl
icat
ions
(met
hods
), th
e pr
imar
y in
tere
st is
in th
e to
polo
gy o
f the
infe
rred
ev
olut
iona
ry tr
ee.
As
with
est
imat
es o
f mod
el p
aram
eter
s, a
sin
gle
poin
t est
imat
e is
of l
ittle
val
ue w
ithou
t som
e m
easu
re o
f the
con
fiden
cew
e ca
n pl
ace
in it
. A
pop
ular
way
of a
sses
sing
the
robu
stne
ss o
f a
tree
is b
y th
e m
etho
d of
non
-par
amet
ric
boot
stra
ppin
g
Boo
tstra
ppin
g is
…A
sta
tistic
al m
etho
d by
whi
ch d
istri
butio
ns th
at a
re
diffi
cult
to c
alcu
late
exa
ctly
can
be
estim
ated
by
the
repe
ated
cre
atio
n an
d an
alys
is o
f arti
ficia
l da
tase
ts, w
hich
repr
esen
t the
pop
ulat
ion.
In th
e no
n-pa
ram
etric
boo
tstra
p, th
ese
data
sets
are
gene
rate
d by
resa
mpl
ing
from
the
orig
inal
da
ta, w
here
as in
the
para
met
ric b
oots
trap,
the
data
are
sim
ulat
ed a
ccor
ding
to th
e hy
poth
esis
be
ing
test
ed.
Take
n fro
m D
r. Ita
iYan
ai
Chi
mpa
nzee
Gor
illa
Hum
an
Ora
ng-u
tan
Gib
bon
Giv
en th
e fo
llow
ing
tree
, est
imat
e th
e co
nfid
ence
of
the
two
inte
rnal
bra
nche
s
Take
n fro
m D
r. Ita
iYan
ai
Chim
panz
ee
Chi
mpa
nzee
Chi
mpa
nzee
Gor
illa
Gor
illa
Gor
illa
Hum
anH
uman
Hum
an
Ora
ng-u
tan
Ora
ng-u
tan
Ora
ng-u
tan
Gib
bon
Gib
bon
Gib
bon
41/1
0028
/100
31/1
00
Chim
panz
ee
Gor
illa
Hum
an
Ora
ng-u
tan
Gib
bon
100
41
Est
imat
ing
Con
fiden
ce fr
om th
e R
esam
plin
gs1.
Of t
he 1
00 tr
ees:
In 1
00 o
f the
100
tree
s, gi
bbon
and
ora
ng-u
tan
are
split
from
the
rest
.
In 4
1 of
the
100
trees
, ch
imp
and
goril
la a
re
split
from
the
rest
.
2. U
pon
the
orig
inal
tree
we
supe
rim
pose
boo
tstr
ap v
alue
s:
Wha
t pro
gram
s ca
n dr
aw T
rees
?
*R
oote
d tre
es s
houl
d be
plo
tted
usin
g th
e D
RA
WG
RA
Mpr
ogra
m (p
hylip
), or
sim
ilar.
*U
nroo
ted
trees
sho
uld
be p
lotte
d us
ing
the
DR
AW
TRE
Epr
ogra
m (p
hylip
), or
si
mila
r.
*O
n a
PC
use
the
Tree
Vie
w/T
reeE
xplo
rer/N
JPlo
tpro
gram
Kno
wn
prob
lem
s of
usi
ng p
opul
ar s
oftw
are
for
reco
nstru
ctin
g P
hylo
gene
ticTr
ees
Ord
er o
f the
inpu
t dat
a (s
eque
nces
) -
The
orde
r of t
he in
put s
eque
nces
effe
cts
the
tree
cons
truct
ion.
You
can
"cor
rect
" thi
s ef
fect
in s
ome
of th
e pr
ogra
ms
(phy
lip),
usin
g th
e Ju
mbl
e op
tion.
(J
in p
hylip
set t
o 10
).
The
num
ber o
f pos
sibl
e tre
es is
hug
e fo
r lar
ge
data
sets
. Ofte
n it
is n
ot p
ossi
ble
to c
onst
ruct
all
trees
, but
can
gua
rant
ee o
nly
"a g
ood"
tree
not
th
e "b
est t
ree"
.
Kno
wn
prob
lem
s of
usi
ng p
opul
ar s
oftw
are
for
reco
nstru
ctin
g P
hylo
gene
ticTr
ees
The
defin
ition
of "
best
tree
" is
ambi
guou
s.
It
mig
ht m
ean
the
mos
t lik
ely
tree,
or a
tree
with
th
e fe
wes
t cha
nges
, or a
tree
bes
t fit
to a
kno
wn
mod
el, e
tc..
Th
e tre
es th
at re
sult
from
var
ious
met
hods
diff
er
from
eac
h ot
her.
Nev
er th
e le
ss,i
n or
der t
o co
mpa
re tr
ees,
one
nee
d to
ass
ume
som
e ev
olut
iona
ry m
odel
so
that
the
tree
s m
ay b
e te
sted
.
Kno
wn
prob
lem
s of
usi
ng p
opul
ar s
oftw
are
for
reco
nstru
ctin
g P
hylo
gene
ticTr
ees
∆P
ay a
ttent
ion
to th
e da
ta u
sed
for t
he tr
ee
cons
truct
ion,
use
“inf
orm
ativ
e”da
ta,
with
out l
arge
gap
s.
∆P
opul
atio
n ef
fect
s ar
e of
ten
to b
e co
nsid
ered
, esp
ecia
lly if
we
have
a lo
t of
varie
ty (l
arge
# o
f alle
les
for o
ne p
rote
in).
Incr
easi
ng th
e ro
bust
ness
of t
he
tree
The
best
pos
sibl
e ph
ylog
enet
ices
timat
es w
ill
aris
e fro
m u
sing
robu
st in
fere
nce
met
hods
alli
ed
with
acc
urat
e ev
olut
iona
ry m
odel
s.
How
ever
, afte
r sta
tistic
al a
sses
smen
tof t
he
resu
lts it
cou
ld s
till b
e ne
cess
ary
to a
ttem
pt to
im
prov
e th
e qu
ality
of in
fere
nces
dra
wn.
The
two
mos
t obv
ious
way
s of
incr
easi
ng th
e ac
cura
cy o
f a p
hylo
gene
ticin
fere
nce
are:
to in
clud
e m
ore
sequ
ence
s in
the
data
to
incr
ease
the
leng
th o
f the
seq
uenc
es u
sed.
Incr
easi
ng th
e ro
bust
ness
of t
he
tree
Unt
il re
cent
ly, t
he li
kely
effe
cts
of th
ese
appr
oach
es h
ad n
ot b
een
wel
l cha
ract
eriz
ed.
In o
ne o
f thi
s st
udie
s, it
sho
ws
that
add
ing
mor
e se
quen
ces
to a
n an
alys
is d
oes
not i
ncre
ase
the
amou
nt o
f inf
orm
atio
n re
latin
g to
diff
eren
t par
ts o
f th
e tre
e un
iform
ly o
ver t
hat t
ree,
W
here
as th
e us
e of
long
er s
eque
nces
resu
lts in
a
linea
r inc
reas
e in
info
rmat
ion
over
the
who
le o
f th
e tre
e.
Incr
easi
ng th
e ro
bust
ness
of t
he
tree
Suc
h m
etho
ds a
llow
que
stio
ns o
f ex
perim
enta
l des
ign
in p
hylo
gene
tican
alys
is to
be
answ
ered
; fo
r exa
mpl
e:re
gard
ing
num
bers
and
leng
ths
of
sequ
ence
s in
the
data
set.
iden
tific
atio
n of
gen
es w
ith o
ptim
al ra
tes
of
evol
utio
n fo
r phy
loge
netic
infe
renc
e
How
man
y tre
es to
bui
ld?
!Fo
r eac
h da
tase
t it i
s re
com
men
ded
to b
uild
m
ore
than
one
tree
. Bui
ld a
tree
usi
ng a
di
stan
ce m
etho
d an
d if
poss
ible
als
o us
e a
char
acte
r-bas
ed m
etho
d, li
ke m
axim
um
likel
ihoo
d.
!Th
e co
re o
f the
tree
sho
uld
be s
imila
r in
both
met
hods
, oth
erw
ise
you
may
sus
pect
th
at y
our t
ree
is in
corre
ct.
Whe
lan
S, G
oldm
an N
. A
gen
eral
em
piric
al m
odel
of p
rote
in e
volu
tion
deriv
ed
from
mul
tiple
pro
tein
fam
ilies
usi
ng a
max
imum
-like
lihoo
d ap
proa
ch.
Mol
Bio
lEvo
l. 20
01 M
ay;1
8(5)
:691
-9.
Whe
lan
S, L
ioP
, Gol
dman
N.
Mol
ecul
ar p
hylo
gene
tics:
sta
te-o
f-the
-art
met
hods
for
look
ing
into
the
past
.Tr
ends
Gen
et. 2
001
May
;17(
5):2
62-7
2.
Sai
tou
N, N
eiM
. Th
e ne
ighb
or-jo
inin
g m
etho
d: a
new
met
hod
for
reco
nstru
ctin
g ph
ylog
enet
ictre
es.
Mol
Bio
lEvo
l. 19
87 J
ul;4
(4):4
06-2
5.
Fost
er P
G, H
icke
y D
A.
Com
posi
tiona
l bia
s m
ay a
ffect
bot
h D
NA
-bas
ed a
nd
prot
ein-
base
d ph
ylog
enet
icre
cons
truct
ions
.J
Mol
Evo
l. 19
99 M
ar;4
8(3)
:284
-90.
Mul
ler T
, Vin
gron
M.
Mod
elin
g am
ino
acid
repl
acem
ent.
J C
ompu
tBio
l. 20
00;7
(6):7
61-7
6.
Gol
dman
N, Y
ang
Z.
A c
odon
-bas
ed m
odel
of n
ucle
otid
e su
bstit
utio
n fo
r pr
otei
n-co
ding
DN
A s
eque
nces
.M
ol B
iolE
vol.
1994
Sep
;11(
5):7
25-3
6.
Yan
g Z.
M
axim
um-L
ikel
ihoo
d M
odel
s fo
r Com
bine
d A
naly
ses
of
Mul
tiple
Seq
uenc
e D
ata
J M
ol E
vol.
1996
May
;42(
5):5
87-9
6.