centre for newfoundland studiescollections.mun.ca/pdfs/theses/wareham_haroldtodd.pdf · on tile...
Post on 13-Feb-2020
2 Views
Preview:
TRANSCRIPT
CENTRE FOR NEWFOUNDLAND STUDIES
TOTAL OF 10 PAGES ONLY MAY BE XEROXED
(Without Author's Permission)
ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING
EVOLUTIONARY TREES
BY
0 II arold Todd Wareham
A tlu•sis submitted to th<' School of Graduate
Studi<•s in partial fulfillment of the
n•<ptir<'lll<'III.H for tht' dcgr<'<' of
Mnsl.cr of Science
Depnrtnwnt. of Computer Science
i\l<'llH>rial { Tui\'<'l'sity of Newfoundland
D<'Cl'lll her I 992
St .. John's Newfoundland
. . !; , . . I ;,
.. ~.
1+1 Natlonal library of Canada
Bibliotheque nationale du Canada
Acquisitions and Direction des acquisitions et Bibliographic Services Branch des services bibtiographiques
395 Wellington Street Ottawa. Ontario K1AON4
395, rue Wellington Ottawa (Ontario) K1AON4
The author has granted an irrevocable non-exclusive licence allowing the National Library of Canada tp reproduce, loan, distribute or sell copies of his/her thesis by any means and In any form or format, making this thesis available to interested persons.
The author retains ownership of the copyright in his/her thesis. Neither the thesis nor subsiantial extracts from it may be printed or otherwise reproduced without his/her permission.
L'auteur a accorde une licence irrevocable et non exclusive permettant a Ia Bibliotheque nationale du Canada de reproduire, prater, distribuer ou vendre des copies de sa these de quelque manh3re et sous quelque forme que ce soit pour mettre des exemplaires de cette these a Ia disposition des personnos interessees.
L'auteur conserve Ia proprh~te du droit d'auteur qui protege sa these. Ni Ia these ni des extraits substantials de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.
ISBN 0-315-82&42-8
Canada
' ' I , • ' ' • ' ~ - ' I '- '" ' •'' ' 1 ' • ' ' • 1 - - > I • • , o <., 1 0 ,.
Abstract
Tlw pror<'~~ of rC'ronst rurt i 11~ 1'\'olu t ioua r~· t fl'l's ra 11 ht• \'il'\\'c•d for
mally as an opt.imi;mt.iou prohl<•m. HPrPiltly, dPrisiou pruhlPms assuriatPd
with lhC' most commonly IIRI'd approarlt~•s to n•c·oust.rurt inp; surh t I'I'Ps
have bC'ell shown to lw N P-rom pll•t (' [ l>a~·Hi. D.l SHH. l>S:O·Hi, DSHi. <: Fx~.
l\ri8~. Kl'viR6]. In this tltPsis. a fnlllii'Work is PstahlisiH•d th01t inrorpo·
ratPs all sttrh prohll'll1S st.udil'd t.o 1\all'. Within t.\ii:; framc•work, t \iP N J'.
completeness rPslllts for dC'dsiou prnhll'llls ar«' l'XI.I'nciPd h:.· appl,vinp; t.!JPo
rcms from [CT!ll, Ga.s~O. GI\IW2, .IVVXO, 1\STX!l, 1\n•XX, SP\!ll]to dl'l'iVP
bounds on t hP rompu tatiou01l <'om pl<'xi t.y nf SP\'Nal funr t. ion:; assol'ia t.Pcl
with l'arh of tlwsP probh•rns, U:llllPiy
• ('l'lllualiou fm~etiou.o;, whirh rPI.urn t.h<' ros1. of t.l11• opt.iural I !'PI'( H),
• solutiou fuurlimz.o;, whkh rl'l.llru "" opt.i111al t.r,.,.,
• SfHtn7li1lfJ ftmclions, which rr•t.uru t.lrP IIIIIIIIH'r of optimal t.n•P:-;,
• f:11llmcratirm funcfiow~, whidr sy:-;I.PIIrat.ka.lly ''IIIIIJJNal.f' all opl.it11al
trees, and
• rmulom-.w~/('(:/irm fuw:tion.o;, whirh rPt 11 rn a. ra udoruly-:-;P]l'ri.Pd "''''" ·
her of the sPt of optima.! trf'I'S.
Where applicable, bounds arr> also pr('Sf'lli.Pd for tlrt~ VPrsiou:-; of r.lti'SI' fu rw
tions that arc restrict('(! to tn~es of a ~iVI'II rost or of m:-;t IP.'iH t.lla 11 or
greater than a given Jimit. Based iu part ou t.IH'SI~ r1~sHII.s aud t.IJI•ort•rw;
from (B1190, G.J79, KMB~I, Krr'!Hii], hounds an! d1~riV1~d on lrow dos1~)y
II
polynorui;d-l.illlf' rtlp;oril.htrrs l'illl approximatf" Olllimal l.fi'I'S, In partiru-
Ia r, it. is slrow•r trsi rrp; tlr(• ff'l'l'lll. rl'strll.s of [A L~l SS!l1] t.ha I no plrylop;P·
urol.ir irrfl'rf'l.rr- optimal-rost. solution prohll'lll r-xamiuPd in this t.hPsis lras
a polynorni;.l-tiur" approxirnal.imr r.du•rrrl' un)Pss I' = Nl'.
Ill
1 j • . , .,
:• . ...
' • l I ~ J ·.
I t
·.~ .! :i
Acknowledgement.s
The rt'SI'C\ITh I'<'JHII'It'd in t hi:; t lwsis aud t lw :-;t udit•:-; lt•adin~ to it haw t akc·n
to ot.lu•rs; this is part.icnlarly tnw of ~raduatc• studc•nts. ~I .\' <tpnlo)!.it•s to thusc•
whose lu•lp I hH\'<' forgottt•IJ to or did 11ol uwut iotl. Tll<'ir \'arious kinclllt'S:oil's tnust
IH• acknowll'dg<'d hy t.IH' fa<'l. that I am. at. ]a:;t., writing t lwsl' <IC'knuwlc•d)!.t'llll'llts.
I would lik<' tot hauk tht• School of c:rarluat.c· St.udic·s. t.lw Alu1111ti Assnciat.iu11
of l'vl<•morial llniwrsit.y of N<'wfomullallcl, \Villiatrt Day. and Wlodc•k Zulll'l't'l\ fur
t.lw fiuaurial assistatu·c· Hllrl<'r whidt I c·ar-ric•d ont. t lw lind. t.wu .\'<'ill'S of Ill,\' st.ndic•s.
For tlw past t.wo ,YPilrs, I have• l)('t'll vc·ry forl.llltitl.t• to wmk part.-l.illll ' for Hidtmd
Grc•at.hatrh aud Brad dP Young of t.l11• Physic-al Oc·c•ilflo)!.rilphy p,ronp ;tl. till' f\IIJN
Dc•J>rt.rt.mt•Jit. of Physics. I would lik<' t.o t.ltilllk J,ot.lt t.ltt'llt illld t.lw st;.tr of tIll'
Physical Oc<'allography group for providin,l!; t.lll' lltoiiP.Y, t.lw c•nvirollllll'lll., """ t.llf'
pat.iPnn• that. ht~vc• allowPd lll<' t.o complc•t.c• aucl write• ttp rny n•sc•fltTh.
I would like• to thank tlu· Dc•aJI or Sdi'JIC'C', t.lll' l>l'illl of St. nclc•ttl. All'air:.; illlll
S<•rviees, tlw Carwcliau lustitut.c• for Aclvanc·c·d Hc•sc•an·h, Willia111 Day, ;111d J';1ttl
Gillard for till' g<'JI('rous trawl fu11ding I lta v<• r<'n'iVI'd owr t.lw last. four .Y''flf'S .
The conferences thc•y haV<' c~nahl<'d IIH' to aUf•tul will lu•twlit. 1111' ;dways.
I would like to thauk till' st.alf of tlw Qtll'<'ll ElizalH"tlt I I Lilm11·y for tlllll'lt
friendly assistance over tlw yc•ars. I owe a simiiHr rlc•J,t, to t.lw st.alf illld fanalt.y of
the Department of ComputPr Sd<•Jin•, part.icnhtrly Elaillc' Buotw, .lf'tlltifc•r But.tou,
IV
Slt;srun lfytll':i, P;,:rkia ~lnrplty. ~lik" H<l)'llH'Ill. Hllrl ~oltt11 \\'hitc•, Paul Fardy.
David Filir·ld, ittld Allan <:ouldiu,!!, lwiJwd with ty1wsdtiug this tl~<·sis in 1~\'rEX.
I know t.ltal. I ant only partially a wan• of all I hat ltns lu•c•n dotw for IIH' hy .Jmw
Foil z :tlld l'aul CiiJ;IJ·d. who ltm·l' sc•rwd as llc•arls of Ill,\' Dqwrt IIH'tll 0\'t'l' I he•
,.,,,m.,•· of Ill,\' st 11dil's: t.his stnall I. hanks tllllsl for tlo\\' surrin·.
I wo11ld likc• t.o I hank all of tlw JH'oplc· who Sl'ttl 1111' n•pri11ts and lllilttllscript.s: in
p;trt indar. I wo11ld lik«' to tltank ~lark 1\t·c•nll'l. ,\lall Sc•ltnan. ~l<tcllttt Sudan. and
Sc·inosttk«' Toda. whose• lltilttllscripts ar<' tlw basis for IIHIIIY n•sults in this tlu•sis. I
wo11ld also like• to t.h;tttk l.lw JWopiP who ha\'1' clarific·clmy I !tin king anclmy l'<'~lllt.s
\'iac•-IJlail and JH't'soual c'oll\'c•rsations. ltol.ahly .lo<' F('lsc•nstc•in. Bill Gasarrh, Antll
.lfl,!!,ol.a, .lint 1\fldin, .Joltatlnt•s 1\i)hlc·r. t\1ark 1\n•IJI.c•l, t\l<mriqll<' ~lt~l<l-~lontl'l'o,
( :ar,\· Olsc•tJ. Ala11 SPI111an. arul .Ja('()ho Tonln. In particular. I would lik<· to tlta11k
.loltantu•s 1\i)hlc·r, for sugg<'st.ing simplifications and rompl<'l.iolls for Sl'Vt•ral n•sults
in Sc•fl io11 ·I. I.~. 1\lmk h:rc•nt 1'1, for t.Jt,. di~russions t.hnl. h•d (among ot.lwr things)
I o the· NP-harclllc'ss rc•clliC't.ion for t.lw F11GT[2:::] prohl<•m, and Bill Gasarrh, who
is illclin•c·t.l,v rt•sponsihlt· for almost all r<'stdl.s n'JH>rt.<'d in this th('sis.
I would likt· to I hauk my many fric•nds, for putting up wit.h my <'l't'atic grad
sltlclc•tJI l)(•ltavior I hc•st• last. ft'\\' Yl'ars a11d sinmlt.attmusly making such contiuuC'd
ht•hador possihlt•.
I would like• tot lwnk my Jlfll'<'tlls for t.ltPir· individual and invaluabl(' support
I 11 lilt' o\·c•r Ill,\' ,\'<'ill's of slucly.
\'
Fiu<dly. I woultllik1' to t hau k my S11JH'r'·isor. \\.illiam Day. for his lllilll~· Yl'ilrs
of pat h•~tn•. sound addc1•, aucl assistalwt·.
VI
Contents
Ahst.rnd .. II
A(~kuowledgement.s iv
Cont.eut.s .. vn
List. of Tnbles X
J.:st. of Figures ..
XII
lr.at.rod net. ion 1
1.1 Orp;auizat iou of This TIH'sis .J
2 Notation 6
~.I ( :raplts. 11,\'fH'I'graphs. and Tn•t•s . ;
( 'o111p11t atioual ( 'omplt•xity Tlwory
3 Comput.at.ional Problems in Phylogenetic Systematics 23
:u Phylogt'JH'I ic Sysl<•mat.ks . ..... . .. . .. . .. .
NP-( 'ompl<'ll' Prohh·ms in PhylogPt)('tir Systt·matks.
:L:!. I :w
( ~harach•r ('om pat ihility !il
Dist atH't' ~tat rix Fitting 55
:t:!.·l Summar,\· 67
.. \'II
4 The Computational Complexity of Phylogenet.ic Inferelll'c:' Ftmt~-
tions 71
. J.J Fuucl illll ( 'ompiPxity ( 'lasst•s .
·I. I. I <'las:-ws \\'ithi11 F/1"'
1' ;:t
·I. I.~ ( 'l<tsst•s \\'it hill FPSI',\( 'E(pol,\'): ( 'uuut ittp; ( 'lassc•s so
·1.2 E\'alnat ion Funrt ions :-IIi
t.:J Solu t io11 Fund ions . !I! I
·1.·1 Spa 1111ing F1111C't ions
.J.;) EnHIIII'I'ilf io11 Fund ions I U.l
.J.(i Handom Gt•Jic•rat ion Fund ions 111."1
·I.; Summary I IIi
5 The Approximability of Phylogenetic lafercnr:e Funct.iuus I OH
!).1 Typc•s of Approxinwhilit.v I 0! I
Absolul.t' Approximability IIi
!).:J Fully PolynomiHI and Polyuorni<~l Tinll' Approximation Sdll'lll!':o; II!J
!),.J HPiativc· A pproxinwhility I~X
!)}) Approxirm1hilit.y hy Nc•ural Ndwurks
T,.(i Summary I:Ui
6 Conclusion 1 :sn
References 141
\'Ill
A Phylogenetic Systematics and the Inference of Reticulation 167
B The Computational Con&plexity of Phylogenetic Parsimony Prob-
lcms Incorporating Explicit Graphs 173
IX
• , 0 • w · • .... , . ,....,.. ..., ; • .,...,.. ~ ,,•~~ .. ,,_,OJ,UU.t t•~~ .. ··, .. ~"• ' •'-""' ' ~ ' ·•• •·1 •· ·· •~• _,. .. ,,.:,.
List of Tables
Phylogl'n<'t.ic pars i moll,\' cri t cria . . .
Bask NP-rompld(• d(•cision prohh·ms
:J Phylogetwtir parsinwny dt•cision prohl<•m sdH'ma1 a ( non-rdklllill t '
tn•ps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ;J~I
·I Phylogc net.ir pat·simony dt•rision proh!t•nt :·wht•mat.a ( llOll·rl'l.il'lllill c>
t.r<•es) (nmt.'d from Tahh• :J) . . . . . . . . . . . . . . . . . . . . . ·10
!) Applirahility of input rharacl.t•t· t't•strict.ions t.o phylop;c'll<'l.ic· parsi -
mony crit<•l'ia . . . . . . . . . . . . . . . . . . . . . . . . . . . . ·II
Phy logPttC'l.ic parsimony dPcision pro hl<•ms (noll- n•t. kula 1.<• t.ri'<'S)
7 Phylog<•net.ic parsimony dt•cisioll prohh•ms ( non-rl'l.inlla.t.<· t.n•t•s)
(cont'd from TabiC' 6) . . . . . . . . . . . . . . . . . . . . . . . . . ~Ia
8 Phylogenetir. p<trsimony d<•cision problt•m sd!l'mata. ( r<'l.iculat.P l.t't'<'s) ·JII
!J Reductions for phylogmt<'l.ic parsimony d<'cision prohl<•ms . . . . . ·IX
10 Reductions for phylog<~tll't.ic parliimony d<·c·iliion prol,l<•tns (c·out. 'cl
11
l:l
from Table !J) .•••••.•..
Steiner Tree decision problems .
Character compatibility decision prohll~llls
til
13 Character compatibility decision prol,lems (c:out.'d fro111 'J';d,), ~ I ~ ) r,;J
14
15
Reductions for character compatibility tb:isioll prol,l1~rns
Distance matrix fitting decision problmns ... . .. .. .
X
:if)
lfi 1\uxiliary dl'dsiou prohl<'IIIS for NP-ha.rdncss proofs of distanrc
lllid.l'ix fitting d<·cision probiPms .iS
17 H<·dur-t.ious for clistaur<' matrix fitting clc•c.ision problems. 62
I H i<<·duf'f.iotls for disf.aure Jllatrix fitt.iug ciC'cisiou prohiPms ( cont 'r!
frotll Tahl<· 17)
U) Corn•spowlmH'<' IH"I.W!'C' fl phylog<'tld.ic infPn'IIC!' probl('m.~ in this
l.lwsis and problt•ms in t.lw lit.<~ratun~ . . . . . . . • . . . . . . . 69
:W Conlpllt;d.ional c.omplc>xit.i<'li of phylop;l'lletic infc.•n•nn• functions . 107
~I Formul<~t.ions of SAT in firsl-ord<.>r logic . . . . . . . . . . . . . I 1·1
:!~ Formulatious of SAT in firsl-ordt>r logic {cont'd from Table 21 ) 115
'2:~ A polynomial-tim<' n•lative <lpproximation algorithm for STEINEH
THEE IN GHA PIJS . . . . . . . . . . . . . . . . . . . . . . . . . 1:30
'2·1 Approximahilit.y of phylog<'lll'tic infere11cc optimal-cost solution
fuurt.ious .......... , . . . . . . . . . . . . . . . . . . . . . 137
'2!'1 Nou-approximahility of various IC'vds of the Function Bmmdcd NP
Qu<'l'.\' Ilit•rarrhy . . . . . . . . . . . . . . . . . . . . . . . . . . . 1:3i
XI
List of Figur~s
Graphs and hypergraphs
2 Typl's of hyperarcs II
TypPs of nu•trks .
A discrl'te rhat·<lrl<'r t.r<'t' ;I()
Char<lcll'l'-:-il.at.e t.re<·s . . :11
6 Strudm<>ofsuhgraph Orr lls<'d inrt•duct.ion frotn X:H' t.o FBlfT~[/"t] Iii
7 R<~cluctions among phylog<'I!Pt.ic inf<•J't'JII'I' dl'rision pro!Jl(•Jlts. . . . fiX
8 Restriction n•d uct. ions among phy logPnl'l.ic parsi IJJOIIY dc•cisiou pruh-
lerns . . . . . . . . . . . . . . 70
9 Fund ion classes wit. hi 11 F pN 1' HI
10 Function classes within FPSP/\CE(poly): c·otlllt.iup; dm;sc•s
11 Difficulties with the rc•<luc.tion from wt•ight.Pd MIN- VEHTEX COVJo:H
to weighted phylogenetic parsimony Pvaluat.ion pro!,lt•IJJS !Jl
12 Conditions for multiple part.it.iou II~V<'Is ou :ml1p;raph (,',. . !17
1 :J Reductions among implicit. and c~xplkit graph lltJwc·i~IJt.Pcl Binary
Wagner parsimony decision prolJic!ms . . . . . . . . . . . . . . . . I 77
XII
By a11d largc•, as the! mass of kuowh·d.e;c grows, meu devote little attC'ution to
t.lw d<!rt.d. Yl'l. it is !.lu~ dP<trl who an~ fn•<pwutly om pathfinders, and we walk all
IIIIC'Oii'H:iously alo11g t)IC' roads they have choscu for us.
Lorc•11 Eisdc~y, 7'/u: Man Who Saw Thmugh Time
Dediratl'd in Memory of
Loreu Eisdry
( Hl07 - i !l77)
XIII
•.
o • ~ .._. .... -, • ,, - o o o , 1 ~ •, ,_ • • ·~· ., " ' , •• •' ' , " , ,. .., • 1 " '" •0'1-.,~
1 Introduction
Phylogenetic systematics is tlw suhdhwiplim• of t'\'olut.ionary hiolog~· that dt•als
with reconstructing t.ht• tr<'t' that. n·pn•st•nt.s t.ht• t•volut.ionary rPiat.ionships of a
set of sp£'cies. Such trePs a.n• ust•d t.o !'l't'HI.t• t.axouomic classilkatiuns for SIH'I"it•s
and to evaluat.<• altemative hypot.ht•st•s of adaptat.io11, <'\'oiut.ionar.r llll't'llltllisnl,
and ancient geographical n•lat.ionships [ECHO, FB!)O, N<•iH7, Nl'~l] . As Uu• dat.a
are seldom available to n•coust.ruct, t.lll' act.ua.l historical t.n•,•, ollt' t.akt•s as atl
estimat~ of this tree a. subset of t.lw s<'l. of all possihlt• t.n•Ps t.hat. Hl't' lu•st. n•lat.iw
to some biologically-relevant criterion.
Many approaches t.o n•coust.mct.iug <'Volut.ionary t.n•t•s haw bt'l'll dPVI'Iupt•d
over the last thirty y(-'at's [Fcl88, PIIS!l2, SO!Hl]. Th<•s<' approaclu•s aJ'<' of l.wo
types [S090, p. <112]: method-based approar.ht•s, wltirh int,•gn-.t.t• t.lu~ nit.Priou
for tree selection directly iuto the nwt.hod for searching t.IH' sd of all possihlt•
trees, a.nd criterion-based approar.hes iu whkh tlu~ nil.t•riou a.nd s1•ardt 111t'l.hwl
are distinct. Method-based a.pproa.dws oht.aiu opt.i111al t.n·<·s quickly, IHtl do not.
rank the suboptimal trees and the alternative hypotlH's<~s l'tw<HI!•d hy t.lu ~sl' trPt 's.
Criterion-based methods do give sur.h rankiugs J,ut, m1~ IIIUt:h slowPr l~t•l 'il.llsl'
known algorithms have to evaluate all possiblt~ t.wes; lwun~, pradkal implt~ rtll'll-
t.ations of criterion-based approaclws s!'ttle for d1~riviug approxinw.t. io11s t.o t.lu~
optimal trees rather thau the optimal trees t.h< ~ lllsi'IVI's,
Consider the formal computational problems assodat1~d wit.h t.lu~s•~ t.ypt~s of
approaciH•s. TIH~ proJ,I<~ms assodatf'rl with nwt.hod-has<:d approaches typically
llltVI~ polyJJollri;d-t.irJH~ algorit.hrrrs, arrrl rtf(' thus of little interest hcrc. In this
I.IJ('sis, I will IH~ c:orrt·<~nwcl wit.h t.he problems associat.Pd with criterion-based
11pproadws. Since l!lH~, dedsion problems associated with the most comrnonly
IISI'rl iiJlJll'oadrcs in phylogenetic. systematics have been shown to be NP-cornplete
[l>ayH7, D.ISHfi, DSHfi, DSB7, GF8~, l\ri88, I\M86]. While this implies tlrat related
prohl<~llrs such as prodrrcirrg optimal trees are harder than NP, it is not knowu
t'Xilctly how mudr harder these prohl<·ms are, or how closely fast algorithms may
approxi11ra.t.P optimal trees. This latt.er problem is especially important because
t.rc•<•s of slightly dilr<•rt•ut. or even the same ('()-; t, can imply very different evolu
tionary hypotlws<•s [Mad!)!, p. :Jl!'i]. There arc many examples in the biological
lit.<•r·a.t.m<' of hypotlwses that have hccn modified or retracted in light of different
t•st.ima.t.<·s of t.hc optimal trC'e e.g. the "Out of Africa" hypothesis for the origin
of t.h<• human mit.rochondrial DNA gene pool [MR.S92, SSV92].
lu this t.lrt•sis, I will dc>rive hounds on the the computational complexities of
st•vt•ral fmrctious hasl'd on plrylogeuC'tic inference problems:
• r11alualion funtlion!i, which return the cost of the optimal tree(s) ;
• ,o;ofulion fuudion8, which rctmtr an optimal tree;
• .o;paunin!J funclion8, which return the number of optimal trees;
• rllllrllt'l'alioll fu111'1io118. whic-h syst.C'mat.ically enumerate all optimal trees;
2
£1 . . .... ~~ "' · "· ·· • • ~· · · -:l. '( , ..... , . ... ~ •• • • • • • • • ....._ • ..._ •• • • , • • _ ..... - ..... _ .
and
• randonH~cicrlion ju11cfio11~. which n•t.urll a rarHiomly-st•IPd<•d lllt'llllH'I' ol'
the set of optimal tr·ces.
Rounds ar<.• also dcriv<.•d for t.hmw f11ndions that. rt't.um l.rt•t•s of a ,1!;1\'t'll cost
or of cost lrss than or grt'ai.Pr than a p;rvt•n limit.. In addit.iou, I will cl<•ri\'t'
homrds on th<.• approximahilit.y of t.ll<' solution funct.io11s a~~ociat.Pd wit.lr t.ht• urust.
commonly-used approaciH•s t.o phylog<'n<'l.ic i11f<•n•un•. Ht•sult.s an• p;i\'t'll rrot. onl~'
for evolutionary trees basPd <•xclusivl'ly o:r dirlrot.onwus SJH'<'iat.ioll t' V<'III.s hut. also
for trees incorporating surh evc>nt.s as hybridization and l'<'l'OIIIhi nat. ion.
The results in this thesis ha.Vl' ht'<'ll oht.aincd J,y applyi11p; t•xisl.illp; l.t·c ·huicfiiPS
t.o a set of closely-rela.t.Pd prohl<'ms. Tlr<'S<' rt•sult.s an• of :-;ip;trilica.u<'<' t.u nmrpll·
tational complexity t.heory t.o the t•xt.<~llt. t.lra.t., hy isola.ti11p; asp<·,·t.s of prohlc•nrs
that cause unexpected increases or dt~cn•ast•s in complt•xit.y, l. l~e •y sup;p;Psl. furt.lrc•r
avenues for research. The biological relevatH'<' of tiH'se r·,•sults is mor·c• prol.lt•Jit·
atic. Some biologists have argued that. t.hes<~ rPsult.s arP uol. i1pplintllh• IH't'il.usr•
( l) the defined problems are too geiJ(~ral, aud prohiPIIIH of prad.ic al int.c•n•s t. lllil,V
be solvable in polynomial time, ami (2) the framework of c1.syurptotk worst.-,·ww
analyses in which these n•fmlts were derived is llllr<~alistic. (.J. S. Farri!; and M. F.
Mickevich, personal commuukatiou). ltr formulat.iug tlw prolllmns r ~xamirrt••l in
this thesis, there have been undeuiablt• t.radmfrs of fiddit.y l.o J,iologind r••ality
for the sake of tractability of analysis. llowr!VI!r, such trad1•olfs uwlr~rl i r! mauy
Hppli,·atious CJf nuttlwmatir.s to real problems, and are not only mwvoidabk hut
rll'n•:-;sary iu tlw iuitiHI stagc~s of fiJI iiJwstigatioll. The pmposc• of this thesis is
not t.o pi'<'S<'III. I'I!SIIIt.s of dir<•d r<~I<'Vallr.<• to biologists, hut t.o lay a theoretical
franrc•work in wlrkh surlr rc•sults may <Ill«' day lw d<•rived.
1.1 Organization of This Thesis
This t.lwsis is laid out. i11 four sc!rl.ious.
lrr Sl'ct.ioll ~' I giw various dd1uitiolls used in this tlrPsis, including graph-
t.lrc•orl'!.ir dc•lillit.ions of IIOif-n•tirulatE~ a11d reticulate' evolutionary trees and an
intrwlul'l.ion to computational rompl<'xity theory.
In St•ct.iort :J, I n•view hasir. concepts in phylogenetic analysis as well as all
pn•viously-dt>liru•d phylogc•ndir. inf<'rcllre dcdsio11 problems and the reductions
h,v whidt tlt(•y haw lwPII shown to he NP-completc. A framework is given that
i ucorpor·atc•s all such prohiPms stud i<•d t.o date. This section also includes def-
i11it.ions aud rc•chwtions for II<'W prohlc•ms involving r<>ticulatc trees, as well as
S<'VPral tlt'W n·durtious for pr<'viously-drfined problems. The tree of reductions
arnotlA all prohlc•ms <'xamirwd in this t.ht'sis is shown in Figures i and 8, and the
rorn·spolldl'tH't' hl'hVP<'Il phylogl'nPtic iufNencc problems examined in this thesis
aud t hos<' in till' lill'l'at.lll'<' is gi\'c'll in Table 19.
In Sc•ct.ion ·1. I mw the OptP hierarchy [C:KH92, KreSS) and paddahili ty
[<'T~l I. (;as~<i] t.o classify tlw phylogt•m•tic inft•rt•nre evaluation problems into
·I
. -~ 'J ). '1
:1
' 1 ,) l 'l l I
j
·:
,, I I J
t.wo groups with in F ""'I'. Tlw t'Olll plexi tit>:-; or t ht•:-;t• prohlt•JllS, a loll~ \\'it h St'\'t'I'H I
ot.hc>r prop<.•rt.ies of tllt'st' prohlt•ms <llld tlwort•ms from [.JV\'1'\(), l\ST:-\!1, Sc•HI 1]. an•
used to dl'rivl' houuds on tiH' t'omplt•xitit•s of tlu· associat.t•tl solution, spauuiu~.
C'llUillC'ration, atul random-g<'IH'rat.ion funrt.ious. All hounds and lwrdtu•:-;:-; t·c•sult.s
derived in this st•ction arl' Sllllllllariiwcl in Tahlc• :w.
In Sl'r.t.ion !), I USC' rc•sttlt.s from [HII!IO, (LJ7!1, 1\1\IB:-\1, 1\rc•:-\~] to ch•rivt• low<'!'
hounds on til<' approximahilit.y of phylogt•ndic inferc•twc• prohlt•Jits hy polynomial -
time algorithms. In particular, it. is shown usinp; t.ht• I'C'I'<'III. rc•sult.s of Arora
et al. [A LMSSD2] that no phyloppll!'l.ic illf<'I'C'III'C' opt.irnal-l'ost. solution prohlc•nt
cx<unincd in this t.h<'sis has a polynomial-t.inw approximation sdwmc• unlc•ss P
= NP. All bounds on approximahilit.y tlc•riwd in this sc•c·t.ion HI'<' stlltllllat'iiwcl in
Table ~4 .
Each of SC'rt.ions :3, ~.and I) ill'gins with a :mhsc•rt.ioll on nol.nl.ion particular l.c1
that section and c.oncludrs wit.h a s11m111ary of t.lw rPsnlt.s dPrivc-d in t.lta.t. sc•c·tiott.
Brief discussions of the biological n•levatU'<' of t.hc•sp rc•sult.s arc• p;ivc•u at. t.lu• c·tul
of each such summary.
2 Notation
This sc•diou consists of graph-t.lworPtic d<'finit.ions of !'Volutionary tre('S and an
introdud.iou to f'omputa t. ioual complexity tlwory. First. t.hNe are some general
dt•li 11 i tious.
l>1•fit11' alphalu•t ~ = {0, I}, all strings .r as h<'ing nwmlwrs of~·, and all
laup,•tap;1•s /,as IH'inp; IJH•ndH'rs of~~·. LC't. l·tiiH' t.hc length of string .r, ILl be tlw
c·ardittality of L, aucl /./II<' tlw sPt of ;d) strings in L wi th )C'ngth /. For a language
/,, w-/, = ~· - /,. D<·filw (.1·, y) as an inv<•rtihiP funr.f.ion that C'tlrmlc•s pairs of
strings into a single· :.;tl'ing, aud \'J, as l.h<' r.haract.eristic function of languag(• L
i.c•. \ J,(.l') = I if .1' E Land 0 otlu•rwisc•.
L..t. jv = { n, I,~ • ... } IH' t h(' IJOilllCgat.iw i nt<·gers, Q+ be t lw II Oil ll<'gati \'r~
mt.ioual llllllllH'rs, an<l n+ lw tlw notiiJPgativc real nlltnhl'rS. Given functions
f : X --t r aud y : 1·· --t Z, lc•t. (lom(.f) and r11g(.f) he the domain and range
of .f, rc•spc•c·t. i \'C'Iy, and !I o f : X --t Z I><' thP compos it ion of .f and lJ i.e. (g o
f)(.,·) = !l(.f(.r)}. If fmiC't.ion .f is not dditu•d 011 input .1~ , then f(.t') = .L If
V.1· E .\' { lmy(f(.t·))l = I}, .f is ... iu,qlr-1mlurd; ('Is<', f is mullivalucd. A function
.f : .:\i --t }/is ·""'"olh if t.hc• fmwt.ion !1 : 111 --t I /(n) is polynomial-t ime computable
atul f{.r) $ .f(y) for all .r :5 ,11 [1\rd~S . p. ·19:J]. For an arbitrary total order Ron
hinary st l'inp;s. cll'fill<' tlw rmlt'l'illf/ Pu as the pair of functions ( E, L) such that
f(i . /) rl'lums Ill<' ith mc·mlwr of (~*) 1 und(•r Rand L(.r , y, /) indicates if J' :::; y
ttnclt•r U for .r. ,11 E (~·)';this tllt'sis will fortts on those ordNings for which E and
()
L arC' computahlt• in polynomial t inw (st•t• St•rt ion ·t.:'i).
Tlwre art• S<'\'t•ral t~'JH'S of hounds for a llUilH'I'ical funrt ion. Tlu•st•l,ouut!s ntll
IH' reprPst•ntt'd hy clttssl's of functions [BJH::-::-:. p. :t!i]:
• O(.f) is t.IH' st't. of functions !I such that for somt• r· > II and fur all hut
fi IIi t(')y Ill a llj' II, .fJ( 11) < I' · .f( II).
• o(f) is t.hC' :-;Pt of funrtious !I such that. for t'\'t'l',\' I'> 0 t~nd fur ;,IJ hut linilt·l~·
many 11, !I( 11) < 1' • .f( 11 ) •
• n(f) is t.)w St'l. of functions .tJ such t.IHtl. for S()Jil(' ,. > II and fur inlinit.t{\'
many u, g(n) > r · .f(u).
Class<'s O(J), o(f), and H(f) rotTt•spotul t.o Joost• uppt•r, st.rid. uppt•r, and loost•
lower hounds, respertivl'ly. Tltt'S<' rlasst•s ( ' fill also ht• dl'litlt'd o\'1'1' wlllllt• das:ws
of functions ratht•r th<w a si11gl<' fuurtion t•.g. 0(7wly ), o(Tmlyloy), wht'l'l' 1mly =
Uk 1/ = nO( I) and polylo!J = Uk logk 11 = log0(l) 71. Alllo~J;arit.ltnts i11 t.his t.llt'sis
will be t.o base :l.
2.1 Graphs, Hypergraphs, and Trees
A gr·aph. G = ( V, E) is a set. V of V<~rtic<·s and a sl't. /~of l'dp;t•s s11d1 t.lt;ll. t•adt ('d.J!,t'
links a pair of vertices. Edges iu wltidt orw wrt.<•x is rll'sig11al.t~d t.lw soun·t• and
the other the target arc calle(l arcs; a graph Will post!d of arcs is a tiif'r·d,.,[ .f/T'flph..
A path between vertices u amlv iu a graph G is a S<!CJIII!III'I~ of altt!ruat.ill~ Vf!rt.in!s
7
<llld l'd.f.!,<'s n1r·,112 •• • 1111 fi&Uu+J surh t.hat r;; is au Pdge in G hetwc•<•n 1'; and lli+t, c,
il.llll t:r+l it.J'(' distirwf. c•tlges ill n, IJ) = It, and l'n+l = v; directed paths are defined
similar·ly. i\ graph is mnurrlrrl if t.ht'fl! is a path between each pair of vertices
i11 !.Ill' .u;raplr; if thc•r<• is au t•clgc! hc•tw<'<'ll eac:lr pair of vertic<.'s in the graph , the
.u;raplr is t·omplt·h. If all <·dg<'s in a graph lie on a singlr pat.h het.wC'en a pair of
v<•r1.irl's, t.IH' graph is /inmr. 1\ hypr·qfl'tlph II= (\1, E) is it set. V of vertices and
a sPt. /~'of lryp<'f'('dg<·s surh that <'itrh lrype>redgc links a group instead of a pair of
VC'r1.irc•s. A lrypc•r('dgc• whos<' Vl'r1.<•x-sd has bren partitioned into disjoint target
awl sour<·c· wrl.<·x-sd.s is c:all<'d a hypr:rarl'; a hypergraph composed of hypcrarcs is
a tlin·l'l,.,/ hyp,.r:qmph. 1\ hypr:I'Jmlh between V<'rtices u and v in a hypcrgraph His
a. s<'<jii<'IICP of ;dt.PI'tHtl.ing V<'l't.ic<'s and hypercdgcs Vt e1 V2 .•• VnC71 1'11+t such that e;
is a ltyp<·r·c·dg<· in //linking .,i and ''i+h c; and c;+1 are distinct hyperedgcs in 11,
,,, = 11, a.IICI 1'11+1 = 11; clirC'd<'d hypC'I'paths arc defined similarly. If a graph has
a fuudion assoriat. i ug numbers (i.e. weights) with its vertices (edges), the graph
is rall<·d a m·rh·.r- (rtl!Jr-} wrighlrd gmplr; otherwise, it is an unwcighlcd gr·aph.
Wc·igllt.c·cl aud llllW<'ight<'d hypcrgraphs arc defined similarly. Hypergraphs arc
usdul ill simplifying and g£>m•ra.lizing results from graph theory, especially those
n·sult.s dealing with combinatorial probl<>ms [Bcr7:J, p. viii). For other standard
p;raph and hypc•rgraph dPfinitious, s<'t' [Bcr73, Ber85); thl' definition of hyperarc
is from [A DS~H].
A pat.h from any \'Prll•x to itself is called a cycle. A graph that does not
8
' • '' ~· ~.. • - · ••- .- ' .,.. ,, ·- -~ ' •' '' '' ,_,.... ·-·•'"H • I( .. , .,._.~ , ,.,.....,- ~, "'.e."' • · .. ·>,._,._,,..,.,...,..,., -'II••• ,"'1: "C ... ...,..,.
(a}
(h) (f)
(c.)
(d) (h)
Figure I: Graphs and hypNgraphs: (a) graph; (h) dirc•d•!d graph; (•·) •lin·ct•·d acyclic graph; (d) direc:tcd t.n•e; (e) hyp(•rgraplt; (f) din•d.1•d IJyju•rgrHph; (~)
directed Berge acyclic hypergraph; (h) directl'd hyp(!rl.n·•··
!J
nmt.aiu ally ryciPs is (/.f'yrlif'. A11 ac:yrlk c:onner.ted graph is called a tree. Directed
cydf•:; mul din~c:t.Pd ;u:yc:lk AI'Hphs are rlc•finecl similarly. Define a dircclcd tree as
il dirf'f'I.Pd rtrydi<· graph t.hitl satisfies thre1~ additioual restrictions:
I. t.lu • r·~· i:; a. dist.irrp;Hi:;lwd verf.<•x cai!P.d llu~ mol,
~. tiH'I'f' j:; at. ]Past on<· dir·<•f·t<·d path from t.hc root to every vertex in t.he tree,
and
a. tllf' root. nt~~rrol. IH' tlw targ('t, of any arc aud every other vertex is the target
of <'Xilcl.)y Oil<~ ii.I'C'.
A h,YJH'f'llill.h from any vNtex to it.s<'lf is rall<•d a Berge cycle. A hypergraph that
dol's uol. rorrt.a.i 11 <lilY B<•rge ryc.!Ps is /Jrrgr acyclic. Di rectcd Berge cycles and
clir·t'I'I.Pd B1•rg<· aryf'!k hypt•rgraplrs HI'<' d<•fiued similarly. Unlike graphs, there
art• many l.yp<·s of ary<·lirit.y for lryp<·rgraphs [Duk85] which are based on Berge
ryr!Ps that. satisfy additional r<'st.rirt.ions; however, Berge acyclicity implies each
of t.ht'Ht' ot.lwr t.ypPs of acyrlirit.y [FagHa, Tll<'orem 6.1]. Define a directed hypcrlrcc
as a clirt•<·tt'd Bt•rgt•·aeyclic lryp<'rgraph that satisfies four additional restrictions:
I. t.llt'rt' is a dist.inguislu•d wrtt•x callc•d the roof,
:!. t.lwrt' is at. l<'ilst Ollt' dirl'cted llyp(•rpath from the root to every vertex in the
hypt•rl.n't',
:t. t.IU' root CC\IIIIOI IH' in t.ht• targt•t-s<.•t of any hyperarc and every other vertex
is in tlw targPI·st•t. of t•xartly ont' hypt•r·arc, and
10
.. · ~
·'
I ~ ·,
.,
? . '•• ~· l '
lo ol
lo ol
(a) (h) (1')
Figme 2: Types of hypcrarn;: (a) 2-hypt'I'Ht'<'s; (h) :J-hyiH'I'Cl.ITs; (r) ·1-hypt•rarrs.
4. there arc only three typ('S of hypc•rar'C's in till' hypt•rl.l't't' (st•t• FiAIII'c' ~):
(i) one source vertex and Oil<~ targ<•t vc~rt.ex (!'1-hyfJf'l'lLI't"),
(ii) two source vertices and one target vertex ( ."1-hypt·ran·), o1·
(iii) two source vertices and two t.argd verl.i('( ~S ( 4-hypr:/'ll:/'f'),
Note that the correspondence of arcs to 2-hyperan~s lltakc•s din•dc•d l.rt'<'s spt•r·ial
cases of directed hypcrtrees.
Trees are used in evolutionary biology to r<~pn·scmt. cwolut.iou;u·y rda1 io11ships
between species. In the biological literature, dircdc•d t.n•c•s a.re nt.llc~d mo/('([ lnu;
and undirected trees arc called 1mroolr:d lrr.cs. I 11 evolutionary tn·es, c~dw~s a I'P i 11-
terprcted as species undcrgoi ng cvolu t.iouary ell anp;<! ( lim:fl!fr·s), vc~rt.it:Ps a1·c~ i11t.m·-
preted as speciation events in which II<!W spec:ies are generated, aud t.)u! root V<!ri.<!X
is interpreted as the most recent common aucest.or of the spec:if~s bdug studic!tl.
All types of trees give an estimate of the! pat.t.ern of spedatiou t!Vt!rtl.s; lun·mvt•r,
I I
ouly dirt•r·l.t!d t.rc·t~s hypothesi;.:e the rlirc~ctiou in which evolutionary change has
prot·t•t•dt·d. Many of t.he trees in this th<'sis will bf' <>dgc-wcighted trees, in which
NH'h 1•dgr~'s WPip;ht. is iut.erpn•ted as tltl' amolllll. of evolutionary change undergone
J,y tJu• SJH'f:il's r·or!'I'Spon<Jing to that C'dg<~.
Tlw n•str·ictions ;dmve on din~ded t.rec•s ami hypC'rtrees guarantee the fol
lowing J,iologira.lly ll'~<<•ssary J>I'OJlf't'l.ies: (I) no species can give rise to one of its
illlf'<'sl.ors, and (~) Pc1ch specics arisf's from l'Xactly one speciation event. All t.ypcs
of l.n·,·s C"an rt•pn•s<•nt. dichotomous speciation events; directed hypertrecs can also
n•pn•st•ut two mon• romplex <·volutionary events using their 3- and 4-hyperarcs
hyhl'idi~at.ion {t.lu· cn•a.t.iou of a third entity from two j>f\.rent entities) and
n·t·urnhirliltior~ (an a.ltPriug of one or both of two entities). Such events involving
t.lw n<•at.ion of t.wo or more paths lwtwccn pairs of vertices in a tree are called
rdir·ulation.o;, and t.re<'S iucorporating these events arc said to be reticulal.e. Retic.
ulatiou as t!Piirwd hl'l'<' is npplirable not only to problems involving hybridization
mrd int.rogr·t•ssion as defined in evolutionary biology [F'un85, StaC75], but also
f.o prohh•ms iu\'olving horizontal g<'tle transfer (Sne75], multi-allele rccombina
l.iou 1'\'t'ttt.s [Ilt'WO], and f.mnsmission of copying errors in medieval manuscripts
[ Lt't•l'\1'\]. St•t• A p)wtulix A for further discussion of reticulation.
Noft• that in 1.111' hiological literature, graph-theoretic trees are often given
dill't·n·Ht twnu•s tl<•pt•nding on what they represent and the methods by which
t ltt•y \\'t'!'l' dt•riwd i.e. phylogram. dendrogram, cladogram. and that a single tree
12
· ~ ''"" , ... ... .... ... . - · ·· · ...... '.. ~.~,,_ ... .. _.~ ... ...... _ ,_..,. .~ ..... ,,.. ... ,a,,...,...-..... , .. , ... . ... ........ ..... ............ ~ ... ........ ( ..... -..-...--.~ ..
may on o...:casion imply H wholr. rlass of trt'es [IIP~·I].
2.2 Computational Co1nplexity Theory
For a more iu-depth trt>at.nwnt. of comp11lational rotnplt•xit,\' t lu•ory, sC'c' [BJH ;:-;~.
BDG90, G.J79 , HSiS, .JohHO, \V\\'8()]. Biologists will lillll [DayH:.!j a p;ood inl.ru-
duction to certain topics in this sPdion.
There arr. many types of formal romput.at.iotwl prohlt•tns <'·A· dt•cisiou, <'\'al
uation, ronnli ng. ThesP prohlf'lllS can lw 1111 i fit•d using t.lu• franwwork c lt•vPiopc•d
in (WW86, pp. 100-101] cf. [.JVVHH]. ))<'lin<· a rc•lat.ion U : ~· x ~· ou pain.;
of objects e.g. (boolean formulas) x (t.rnt.h assignnu•nt.s t.o hooiPall val'iahl«•:-.);
(graphs) x (cliques). F'ormal r.omput.at.ional pmhlc•ms c;u1 IH' vi«'W<•d HS fnnl'l.iotls
defined on the projection of R ont.o a gi wn denwut. .r.
• Decision Ptoblcm (PilOIJ):
For some hoolcan-valucd predicate r: ddirwd Oil ~-'
PROB(:r.) = 3y [(x,y) E R 1\ G(y)].
• Solution Problem (SOL-PROIJ):
For some hoolcan-valued predicate G defined ott ~·,
SOL-PROB(x) = { y I (x,y) Ell 1\ O(y)}.
If relation R has au associated valuaticm fuudiou b : ll -+ N, li. r:om~sporuls
intuitively to an optimization problem. Defiue tire followiug JHo!JI< ~IIIS em sudr If..
• (,'inr:rH·osl Solution l)roblr:m (SOL- VA L.BQ-PROJJ):
SOL-VAL.EQ-PROB((:r,/,:)) = { y I (x,y) E It 1\ b(:t:,y) = k}
• (,'i1)(:u-limil Solution /Jroblr·m. (SOL-VAL.LE-PilOB, SOL-VAL.GE-PROB):
S 0 L-V A L. L E- P HOB ( ( :r, ~:) ) = { 11 I (a:, y) E R 1\ b( .1: , y) ~ k}
SOL-VAL.C:E-PH.OB((;t:,k)) = {]/I (;r;,y) E R 1\ b(:r.,y) 2= k}
• Oplimal-r·o.o;l Bvalual.ion Problr:m (M/N-PR0/3, MAX-PROB):
MIN-PitOB(:r.) = miu b(:t:,y) (:r,y)EH
MAX-PHOB(;r.) = max b( .t:, y) (:r,u)EH
• Oplimal-r·o.o;f Solution Problem (SOL-X-PROB, X E {MIN, MAX}):
SOL-X-J>HOB(.r.) = { Yl (:r.,y) E R 1\ b(x,y) = X-PROR(x)}
'l'ltJ'('(' otltl~J· t.ypt•s of problems may be defined on the ranges of Y, Y E {SOL-
PHOB,SOL-X-PROB} (X E {MIN, MAX, VAL.EQ, VAL.LE, VAL.GE}).
• SpatnthttJ Problrm (SPA N-Y):
SPAN-Y(.t·) = I{Y(x)}l.
• /iaudom-Srlrdion Pt'Oblfm (llAND-Y ):
HAND-Y(.r) = !f, wltl'I'P !I is a randomly-selected member of {Y(:c)}.
• Eu~tmrrtiiio11 Problem (ENUM-Y ):
ENU~l-Y(.I', i) = y, where y is tiH• i-th member of {Y(:r)} under some
pulynomial-t.inw or(h•ring P.
l ·l
;J.
·' !'
j 'I ~
j '3
' . ' i .l
.l
.. ... - ,.-, ..... _ . ' •••• ' ... ~ · ·-' . .. ,, -· · ·· ···· ....... .... ...... ~·~ ...... ~ ...... .. ·· · - ·· · ..... .... . __ .... . ... ~ooo ............... .,.,..,.....
Each of these problems corresponds to a function. A dt•cision pwhlt•tll also ror
respouds to the language compos<•d of t.h<• snh:·wt. of its instann·s wlwst' solution
is "yes:'. A problem X is said to he solwd by an algorithlll if fur any input . . 1· ,
that algorithm compntl's a singh• vahw from {.{(.,.)} for t.ht• fund ion f t•mhoclit·cl
in that problem. Let. X, denote t.lw sl't of sinp;IP-valtt<'d functions t'ol'l'l'!'iJltllltliup;
to algorithms that solve prohiPm X. Inputs t.o a prohlt•nt will ht· rallt•d ill.o.;ltll/l'f',o;
and outputs will be called .'iolulions.
Define deterministic. Turiug machin<•s (DTM), noJHIPt<'l'lllinist.k Tl\1 (NTM),
and deterministic aud noudetcnninist.k orad1• TM (DOTM, NO'l'M) that. n•r
ognizc languages ( arccplors) and compu1.P functions ( lmu:Hiw·,.,.,.,) iu t.lw st.au
dard manner [BDG88, G.J79). A DTM t.ransduC'1~r N comput<·s y oil iuput .1~
(N(x) --+ y) if y is the fiual r.out.cnt.~> of N's out.put. tap1· for t.lw computat.iort
of N on x. A NTM transducer N computes y ou iuput ;t (N(;1~) 1--+ y) if t.hNI'
is an accepting computation of N ou ;r: such t.hat. y is t.hP final cont.c~ut.s of N's
output tape [Sel91, p. !l). DTM trausducers Wlllpllt.e sin,t!;l1·-val111•d functions,
and NTM transducers compute partial multivalued fundio11s. A11y i11put. to it
TM transducer that does not have an acr.epti11g computation coJilpllt.(~S syutlllll
1.. An OTM that forces all queries to be made simult.<uwously is 11.on-rulttpl.i11l',
while an OTM that allows queries to he made 011 t.he basis to answ1•rs t.o previous
<1ueries is adaptive.
The computational resources used by an algorithm t.o solve au iust.aru:e of it
11
prol,lmll Cllll lw visualized as the computat.ional resources used by the TM which
t'tH'rt~sponds t.o that algorithm. Let 1'A(;r,) he the amount of rcsourcP- R used hy
;dgorit.lnn A o11 iuput :r.. For a fuuctiou f: .N--+ .Nand a computational resource
/(., an algorithm A is f-ll (f-R compulablr.) e.g. polynomial-time, polynomial
t.itw~ ('omputahl<~, ir 1'A(I:r.!) E O(J). A problem is f-R ir there is an f-R algorithm
t.hat. solvPs that prohhmt.
Prohlc~ms can he group<'d into complexity classes based on bounds on compu
l.al.iona.l t'l'sout·ces required to solve those problems e.g. DTIM E(]Joly ), which is
t.llf' sd of all prohlems solvable hy polynomial-time DTM. Some standard com
piPxity class<•:-; for decision problems arc:
A II decision problems solvable by polynomial-time
DTM.
NP All decision problems solvable by polynomial-time
NTM.
PSI' ACE All decision problems solvable by polynomial-space
DTM.
EXPTIM E All decision problems solvable by exponential-time
DTM.
lt. is known that P ~ N P ~ PS PACE~ EX PTI ME, and that P c EXPTIME
(BDGHS, Proposition 3.1). Several classes of complexity between P and NP are:
16
' !' 1 .. ... - ,p '" '- ........ - ........... , . ..... - .. . .. ' , ,, .. - · •. .. - ... ..... - .. .. . .. ... . ·- - · . .. .... ..... ,o~ ...... . ...... , "''':~ ·~-·~ .... -~, ... ...... ~ •• , .. -voow\• • .. ~.o••· ... - ... t .. · r'~, .l."""'f'"
UP All dt•cision prohl<•ms sul\'ahl<' by poi,\'IIOIIlial-t.imt'
NTI\·1 such that for <'ach input .• t.lwn· 1s at. most. um•
accepting computation.
FewP All decision prohlc•n1s solvahlt• hy polynomia.l-t.imc•
NTM such that for <'ach input I ami a fixc·d polyllolllial
p, there are at. most p( Ill) <H'Ct•pt.inJ); t'Oill p11 I. a t.ions.
R All decision prohl<'lllS solva.hl<• hy polynomia.l-t.illll'
NTM such that for each iuput. I, <'it.h<•r t.lu•rt• a.r·c• uo
accepting computat.io11s or <•Is<' at. 1<-asl. lwlf of a.ll <'0111-
putations arc acc('pt.ing.
Many problems are of complexity hP-twc•en NP aud PSI'ACE, aucl art' wit.hiu t.hP
levels of the Polynomial Hierarchy [MS72]:
ep - A p - ~p - fJ o - uo - .uo -
ep _ pB~(O(Iogu)] k+J -
PH= U ~~ k=l
17
wlu•n! fJY ( N fJY) is t.lw class of problems solvable by polynomial-time DOTM
( NOTM) that r.au usc! any orad<~ i11 dass Y, a11cl p)'[/(n)J (P1r[f(nlJ) is the class
of prohlc!flls solvable hy polynomial-tinw adaptive (non-adaptive) DOTM that
C/111 a:;k up t.o f(n) queries to au oracle in class Y. Levels ~L E~, and n~
W«'f't! ddirwd in [MS72], and lev<'l (-)~ was defined in [Wagl\88]. It is known that
<->t ~ ~~ ~ }:;r U lit ~ pl:rfl] ~ (-)r+r, that. P II ~ P8 PACE, and that if for some
~:, lit = 2:t t.lt<•n PH = ~~ (St.o77, WagK!JO, Wra77]. Two working hypotheses in
r·o111plexit.y t.h<'ory are that P =/= N P and that PH does not collapse to any finite
ll'vc•l.
Many of the! language r.omplcxity. classes above can be restated as classes of
siup;lc•-valuPd funr.tions. Let FX denote the class of funct.ions computed by TM
us<•d t.o dc•liru• language class X e.g. FNP, Ftl~, FPSPACE. Define FPSPACE(poly)
a.s those• fuudions in FPSPACE whose outputs are polynomially bounded in
t.llt' lt•ngth of tlu· input, and FPII = Uk=l F~~. It is known that ~~ c Ftl~,
ft' /)l;r!J(n)J ~ F p~n/(n)+l), F' pl;r(J(n)J ¢.. F pl:f+J(f(n)-t] uuless P = NP, and
F/'l;r+.l11 ¢.. FP~tlf(n)J unless F'P/1 = pp'Er (Gas92, Krc!J2b]. The relationship~;
within aud lu'l.WPPII rla.sst's in F'PH have been established only at the lowest levels
( l'<'C' St•ct.ion '1. 1.1 ); thos<' relationships known to date suggest that classes in FPH
hl'lww V<'I'Y dilft•rently from their analogues in PH [Gas92]. Classes of multi valued
functions an• al:-~o possibl<•. There are many restrictions of polynomial-time NTM
t.mnsducers that g<•ttt.•rcltt• sttrh classes [Scl91); one such restriction is f'.q, the st;b-
18
;.
r.··· ~
i·
f ...
. . ,.
.,
' .
set of functions fin clal'l'i F snrh that. graph(.()= {(.t·,y) l.f(.r) .._. y} E /' i.t', out-
puts can be checkt•d in polynomial t.inw [SPWI. Val7ti]. llt•lint• FI\IPII = Uk=l F~~:
are called NPI\IV, N PMVq, and N /'i\1\'~r-• in [FIIOS!l2, St·l!ll], <md NT~I in
F'N P_q which compute total functions an• rallt•d NP uwt.rir '1'1\1 in [1\rd\KJ. Funt'-
tions in F N P9 compute.• tht• solutions <t:-;sociat.<•d with dt•c·ision prohlc•ms iu Nl'.
For further discussion ahout. t11esl' mHI ol.h<'r fuuct.iou dass<'s, S<'t' St•C'I.ion ·1. 1.
A reduction n ex ri' is an .-~.lgorithm that. solws prohlt•m II hy mdn!!; an al-
gorithm for problem 11'. Hedud.ions cml<•r prohl<•ms hy mmput.al.ional lwrdnt•:-;s,
The two main typt's of r<>dttclions hd.Wt't'll dt•cision prohlt•ms art•:
• many-one ($~J: A $~1 8 ir t.lwrt• is a polyuomia.l-t.inu· function f sur·h
that x E A if and only if f( ;r.) E /3 .
• Turing (::=;~): A ::=;~ B if there is a polyuomial-t.inu~ fundiou using IJ as
an oracle that determines if ;1: E A.
A generalization of many-one rcducibilit.y l'allt•d rtt.dr·ir~ 7Trludbilily holds hd.WI'<'II
single-valued functions.
Definition 1 (adapted from [Kre88], p . 493) /,r:l .f •. fJ: ~· --. ~·. A mdric:
rcductiou from f to g is a pai1' of polynomia/-timr: fundion.c; ('I'll '/~) , whrTr: '!', :
E* ~ E• and T2 : E* x E* ~ ).:•, such that f(x) = '11(x,g('/'!(;r))) fo1' all ;r. E 1;• ,
The following variant holds hetwPen problmns.
I !J
Definition 2 Lr:l II n:·.i If' IJf: 7n·ob/(;7n!i and SOL-X (/) be lhc set of solutions
u.o;.w)('ialr:rl wilh iu.<>lrmr:r: I of pl'obh·m X. A metric reduction from n to ll' is a
pair ofpol!Jn.mn.ial-limf· funrfirms (T~t 72). when· T1 : I--+ I' andT2: I x 8'-+ S,
,o;rLr·lt /hal for uny,r;iu!Jir-valurd f7Lndion f /hal sol11c.c; 11', T2(J, f(T,(l))) E SOL
II(/) for any ill.';/rmu I of II.
This n·durihilit.y i.s a r«!.st.rictPd V<'rsiou of the Tming redur.ibility between partial
mult.iv;du(•d fund.iou.s defitwd in [FIIOS92). Note that the defini tions of these
nu•tric n•(lucihilitiPs are equivalent for problems that are single-valued. Another
rc·lation ca.llc~d rdiuenwnt can also hold between multivalued functions. Given
mult.ivahH'cl functions f aud g, g is a refinement off if dom(/) = dom(g) and
for all .1· E rlmu(.rJ) and all y, if .Q(:r:) ..-. y then .f(:r) H y. These relations
rnn also hold lwtwe('IJ whole classes of functions. For instance, if F and G are
two cbtssc•s of partial nmltivaltu.•d functions, then F ~c G if every f E F has a
rdi ttt'llll'lll. in (,' [S<'I!J I, p. '1]. Roth inclusion and refinement relations can hold
lwtWt'<'ll mult.ivilltJl'd function classes, and single-valued classes can be included in
mult.ivahu•d class(~S (ind<'('d, this is equivalent to refinement); however, only the
singlc•-vahwd I'PrinPment. relation ran hold b~tween multivalued function classes
and singi<'-Vilht('d rlassc•s.
Givc•n t.wo problems ;r. andy and a reducibility r, x andy are computationally
rquir•alrnl if .r 7'-redures to y and y 7'-reduces to x. Given a class of problems
.\" and a n•dudhilit.y r. a prohlc•m ,11 is said to be X -hard if each problem in X
20
· I
'
e ,.
f: !.'
" .,., •. , -.,rr ... ...,..,~,_.... •• .,. r.-• ~·· • . .... ~ . ~-~ ~· , ,.,., " <• ••..., .. . ~ .. ""'10' ............ ........ - ,,.. ... ,.,.,...._, ... , ... . ~ ..., , , _ _.,_,.•••u.n _,.,....._,.. ,....,... ____ ... , ._.,. • . _, ___ ,,....
r·-rcduces toy. If ,11 is X-hnnl aJHI is also iu .\'. ,11 is X-t·omplt ·/1': if !I is X-hml
and is not in X, !J is pmprrly X -hart!. Two limited t~'JH's of l't•dut'l.ions al't•:
• at'ilhmrlir-rqrtirmlrnrr r'ffludioll,'i: Prohlt•ms II and II' tlilft·r only in t lwi r
cost-functions bn and /Jn, , a.11<l tiH'rt' t'Xist.s a pair of pol,\'Homial- tinu• func·-
tions (71, 72) such that for all iustann·s ,,. of II ancl II', bn( .r) = '/'1(btJ'( .r)}
• rcsf.riclion redurlious: Prohh•ms II ami II' dilrt•r only irr t.h;ll. dom( 11) C
dom(ri') i.e. ri is a. suhprohi<·m of II'.
By definition, rcstrid.ion and arithnH't.ir-t•quivalt·ru.·t• rt•ductious arc· 1111111Y-oi1C' and
metric reductions.
Though some of the problems disr.ussc•d in t.his t.IU'sis arc• must. ua.l.urally de•-
fined on n+' all problems will be rest.ric.t.C'd t.o Q+. Ht•alniiiiii)(•J's in p;<'IIC'I'Cl.l ('an-
not be used because irrational numlwrs t•.g . .fi., cannot. IH~ n•pr<•sc•Jit<'d within
a computer whose running time is houndc~d l>y a fuuct.io11 of t. hc~ IPugt.h of if.H
input. All irrational numbers that arise in c:akulations 11111st. also lu~ dimiuat.c•cl
or approximated e.g. vx--. r JX 1· A case study iu how a r·~~~d- u'u"hc·r· prohl«'nr
is modified to be computable is given for the Euclid<~au Mirrirnal St.drH'I' 'l'n•c•
problem in [GG.J77]. The lower bounds .giveu IJy such Jnotlili1~d pro!Jic ~ms <HJ tlw
complexity of the actual problems is the best that call he clmu! wi thiu t:olllpll·
tational complexity theory as it currently exists. lloweV<!r, t.lwre may lw other
options [BSSS~J , Ko91].
21
Though prol,bns will he defined on Q+ for the convenience of readers, all
probh•IJJS will adually operate 011 N. This is easily done by multiplying out t.he
rational d<'llolllirtat.ors i.(', ~~ ~ --t {'I, !J} + 12. Thus, the bit-representation length
of IIIIIIIIH'I'S will IH· proportional to their value. This cnsmes that the length of
c·c·rt.aiu snwllrat.ionaiJIIJI!llwrs will not exn•(•d that of larger numbers (e.g. though
U < l:l, ll:JI + 11 '11 > IJ:il; bmVC"ver, IJ:ll < IJ:l· 141). This property is necessary
iu S<'V<'I'HI proofs in S<'dions :J.2.1 and :J.2.:1.
22
•''"' ... ' " ""•'•'" ~•• I P •• , , ,. • ooo r o, • .. ~ , , "' o-.r•O , , o ~ ¥' , ,_, , • • • ..- • • • - ... . , , ,.., , •o• <• --~ r-• ,o, ...... .... ~h• •"' ~"· '-o o<...t U•O - ' ~"• <• '•o • oi OO,'~ " , ...... .J
3 Computational Problems in Phylogenetic Sys
tematics
This section hC'gins with All 0\'('1'\'h•\\' of \'ariou~ coun•pts iu phylogt'lll't.ic" sys
tematics. This is foilowC'd hy a n·vii'W of n•rt.niu dt•cisiou prohkms a:-;sucialt•cl
with phylogenetic. analysis using t.IH' phylop;t'llt't.k p«~rsimou,v, <'IHirac·lt•r l'orn
patibility, and various of t.hc distaJH'l' mat.rix fit.t.inp; nitc•l'ia, and a l't•vit•\\' of
the f' mctions hy which thcs<' problems havt• lwt•n shown l.o ht• N P-ronaplt•t.c•
[Da.y8:J, Day8i, D.JS86, DS86, DS8i, l\riH8, 1\MHH]. This :·wct.ion also includt•s
definitions and rc>ductions for several new phylop;('JII'I.k p11rsimony prohii'IIIS t.llllt.
allow limited amounts of rcticulat.ion , as Wl'll as a IH'W n·dul'l.iou fo1· tht• i\ddit.ivt•
Evolutionary Tree problem [Day8:J] .
3.1 Phylogenetic Systematics
Systematics is the subdiscipline of biology that deals wit.h onh·riup; SfH't'if's into
set.s of groups (systems) according f.o various kinds of rdat.ion:-~l1ips lwt.wc•c•11
species (e.g. ecological roles, geogmphical proximity, owrall similarity) [i\xH7,
Hen66J. Phylogenetic systematics is in tum the :-;uhdisdpliue of biologkal sysl.<!lll
atics couccrned with ordering species hased on their (!Volut.ionary rdationships;
specificdlly, species are grouped together hy descent from a com moll a!lt:r!sl.or, ;uul
these groups are nested hierarchically to make an evolut.icmary l.r< ~e. The pro(:<!ss
of n•r·on~l.ruc:ting r!volut.iouary t.r«•r!~ is r.allr•rl phylogcnclic analysis (phylogenetic
iufr't·r·uf:r:), anrl I hr! evolut.iouary trees so reconstructed arc called phylogenies.
Tlw units t.hat. are orrlererl in phylogenetic analysis are called taxa. Two types
of data. llr<' t.ypi('ally availahle t.o rr•c.or1st.ruct evolutionary relationships among
taxa:
• Discrete Character Matrix: Tlw data arc an m-by-d matrix giving the
valuPs possessed hy each of a set of m taxa for each of a set, of d charac-
f.«'rist.ks. Tlt<•sr~ dmrac:terist.ic.s are called characlet·s and their values are
t'hm·adr'l' s/alr·s For example, a c.haract.cr flower colour might have char-
al'l.r•t· st.H.I.r~s blue, yellow, and red. Character-states are grouped into
rlmmdr•rs hy the relation of homology [Ax87, ECSO, Wil81}.1 The vector
of C'harader-st.atcs over all taxa for a particular character is a character
pallr:rn, and the vector of character-states over all characters for a partie-
ulat· taxon is a. character disl1'ibution. If a character has only two states, it
is binary; r•lse, it is unconstrained ( m.ultistatc). If a character has a graph
1 1/muolog,l/ is t.hc relation among different structures in different species that evolved from a wmmonnJH'«'st.ral stwcirs (e.g. the character mnnuualiau fore-limb that has states arm (1mman lll'ings, ap«'s), foreleg (dogs, horses, tigers), wings (bats), and flippers (dolphins, whales)). Tht•rt• art• o!.ht•r kinds of rrlat.ion~ alllong observed character-states, such as analogy, the relation of similar st.rurl.tJrt•s in dilferrnt species that have arisen independently in several ancest;al spL•rit•s (t•.g. t.ht• chnractcr wings t.hat groups together the wings of insects, birds, l\nd bats). All rhararh•r-~t.al.t• relations givr evolutionary information of some sort; however, only homology dt•limit.s groups of sprdrs sharing a common ancestor, and thus only characters formed by homology arr us£'ful in rrC'Onstructing evolutionary trees.
In t.hr ra .. c;e of molrcular sequences, homology can hold among different sequences from difft•rt•nt. spl•dt•s, as wdl a.<~ bet.wccn different positions in different sequences; indeed, the problem of l'stahlishing srqut•llr!'-position homology is that of deriving a sequence alignment [5090, pp. 4 IG-·11 i]. Not.t• t.hat. in molt•cular biology, t.he term "homology" is also a synonym for sequence similarity (1\lll!lO, pp. i - !J] .
24
.... __ .,_ ,....,,, ... ""'"'' ' ' ' .......... ' ... _,,,,. • ...,_...,.~,,~,.. ._..,. •-·--·-- "'' • "'· •• ,." ··-~·-·"'~..--~ N.1"" ..... ••\ f"'-4fS:n"'lf\.#("r.-.;<.e•t•l-4- "'--·-·---
·'· f.:
imposed on its states who~;t' t•tlgt•s spl·rify tht• allnwahh· rhml~t·s from om•
state to anoth£'r, the charad.t•r is ot·drr·nl; t•h;t•, t.lw <'hara<'t.t'l' is Ulllll'lhrn/.
.• j If the edges of an ord£'rcd charart<'l' ':; graph an• dired.l'd, t.ht• rhill'act.~·•· is
pola1'i=cd; else, it is 1Wpola1'i=cd. H thP edgl's of au ordt•rt•d rlwrart.t•r's graph
have weights, the character is u•ri,qhlrd; dsP, it. is unwri!lhlnl. Onl<'l'l'd char-
acters are typically baHed 011 JiJH'ar, compll't.<', or t.n•t• ~raphs. lu polarizc•cl
characters, if state X is the somn.~ of a dir<'cl.t•d path t.o st.at.t• \", X is tw-
cestml to Y and Y is derived n•la.t.i ve to X. Tlu~ sl.al.t• t. hat. is iiii<'Pst.ral t.u a.ll
character-states in a polarizl•cl c.hm·adt•J' is t.lte «UJrPst.ral sl.at.t• for that. dwr-
acter. By convention, the aucest.ral ami dcri wtl statt.~s i 11 polmi:t.c•tl hi 11 a.ry
characters are writtcu as 0 and J.
• Distance Matrix: The data are all m-hy-m matrix ~ivi11g a llll'iiSIII'I'
of dissimilarity or similarity bctwc•(.m c~ac.h pair of t.a.xa i11 a. s!'l. ,c.,• of .,,
taxa. Tlte terms "similarity" aud "dissirnilarit.y" dc~11otc• quautiti<•s f hal.
a.l'e precisely defined and i11versdy related; wlr<!ll l'igor is not. r·c·quin·d or
specified, both will he denoted hy the term "<list.ruu:e" (SO!JO, p. ~~:~J. Ld
Mn be the set of non-negative rat.icmal real-valued mat.ri<:<!S 011 11. taxa, ;uul
Bn C Mn be ~he set of all matrices whose~ off-diap;oual vahH!S ilrt! iu {I, 4}.
Call members of Bn aud Mn binary and 1tncon.o;lminr:tl matrices I'<!SJIC!divdy,
by analogy with discrete charadcrs. Let. X.o;' he matrix X 011 H rt!sl.rir:l.<!d
to 8' c S. Every matrix represents a dist.auce fuud.iorr tl: .'{~ -+ 'R, w!Jic:il
25
rnay satisfy some subsd of tlw following properties.
:l. V:r:,y E S, d(;r:,y) = 0 implies x = y.
:1. \1:1:, y E 8, d( ;r., y) = d(y, ;1:).
•1. V.r, y,:: E S, tl( ;1:, !J) ~ d(:r:, :) + d(::, y).
r, , V.r,y,:; E ,c,·, d(;l',!J) ~ max[d(;l:,::),d(::,y)).
fi. V.r·,y, ::,wE S',
d(,,., y) + d(::, w) ::; max[d(:r.,::) + d(y , w ), (l( ;r, w) + d(y, z )].
Conditions (•I), (5), and (6) arc known as the triangle, ull.ramciric, and
adtlili1•r irH·qualilics, rl'SJH•ctivcly. A function that satisfies conditions ( 1 ),
(:~),and (:q is a .,;;rmi.mcl7'ic; if condition (4) i,; also satisfied, the function is a
mf'lrir·. Mdrirs that '3atisfy conditions (.5) and (6) are known as ultramcl 1'ics
aud lrrr mrlrirw, respectively (sec Figure :3). The m:!!'her of distinct off
diap;onal values iu an ultramctric is the height of that ultrarnetric, Tree
!llt'l.rics and ultranwtrirs can bl' represented as trees; let Un (An) be the
~t'l. of cllllllt.ramdric (additive) trees on 11 taxa, Un ,q C U11 be the set of all
ult ramd.ric tr('l's on 11 t.axa of height at most q, 1 ~ q ~ n(n - 1 )/2, and
Irfr : Uu - M11 (7r..t : A11 - 1\Jn) be thr function that maps an ul trametric
(<uhlit.iw) lrt'<' outo its ultramctl'ic (tree metric). In this thcc;is, A 11 will be
n·st.rirt<•d t.o A;~ ( di ... crcli::rd addilillc trees) whose edg~s have length k/2, k >
26
............. .... ' ... , . . .... ~ .. .. . .. .. '. ~ - ..... ... . , .. .... ~ . ............ _.""'
2 ~~ ·I 5
2 :J 0 25 r;o :m r;o n lfi I:~ It' I:'
1 0 5 :3 2 2.1 0 r;o :to r;o •) 1.1 0 .j !I ~
2 5 0 " 3 .10 50 0 fiO 15 :J I:J ·I 0 - (i I
:3 3 4 0 'I :Jo :10 iiO 0 !)() ·I I~ !I I () II !) flO fiO Iii !iO 0 !i I:' ~ (i II ()
60 . .. ... . . . .
50 .. .
·10 . . . . . . ' .......
5 3 a ;,
:JO . .. .
20 . .
lO . . .
0 .. . . ' .. . ,, 2 ,, :J !) 2
2
(a) (h) (c)
Figure 3: Types of metrics (tak(m from [DayHH]): (a.) a. nwt.ric· and a possihlt· representation in the 2-D Euclidean plane; (h) an ult.ramdric: and it.s assol'ial.t·d ultrametric tree; (c) a tree metric: and its assoc:iatPd aclditivn t.rP<'.
0; note that ultrmnetrics drawn fro111 A;~ h<LV<~ ini.<',!!;('J' olf-rliap;onal (•nt.ri(•s.
Ultrametric trees arc by dcflnitiou rooted whilr! ;ulclit.iw t.rc~(·s <'<1.11 ),(~ (•it.ht•r·
rooted or unrooted. Ultramet.rie trees correspond l.o root.(•d <uldit.ivr~ l.l'f'('S
in which each leaf is the same dist.auc:e from t.he root..
Discrete character matrices arc generated hy examining the taxa of iul.(!r<!sl.. Dis-
tancc matrices are generated directly via. certain l.ecltni<pws (i.e•. imrn•rr•olop;kal
assay, DNA - DNA hyhridizatiou) or deriw!d from discrete eharad(~l' rnat.rkc~s
'27
by applying a distar1ce f11nction defined on pairs of character distributions. Raw
dist.ane<~ matricc~:-; must. oft.eu he transformed into matrices that reflect "true"
~~volutiouary distances [SO!JO, pp. 122-1:IG]. These types of data arc not ideal for
tlu~ task of n~mustrud.iug C'Volut.iouary history, but !.hey arc sufficient: as taxa
origiual.<! hy iulwrit.a.uce with modification, <'ach ancestral lineage in the evolu
tionary tree! has l<'fl, its signature in its descendents, either as character states
that haw propagal.<'d t.o that lineage's descendents, or as a certain evolutionary
distance by wl1ich each such desceud('nt is separated from every other taxon in
t.lu• t.r·c•c'. lienee, many of the ancestral lineages, as well the details of the process
hy which ancestral litH!agcs gave risP to the observed taxa, can he reconstructed
using t.hl' types of data above [EC80].
'I'IH're ar<' several other useful representations for evolutionary tt·ces besides
l.n·<· gmphs. In trees constructed using discrete character data, each vertex in
the• t.r<·c has its own set of character-state values. These trees can be summarized
hy tlw dtamct(•r-st.at.C' sets or their vertices (sec Figure 4). Alternatively, for
<'ach chamd.t•r·, one can map the set. of vertices possessing each chara.cter-state
onto t.ha.t. dwt·ader's character-state graph to create a cladistic character, and
sttmmariz<' a tn•e by its set of cladistic characters. Cladistic characters are often
t•a.sit•r t.o visualiz<~ a .. c.; trees of subsets (see Figure 5). Discrete clw.racter matrices
and individual rharartt•rs may also be summarized by cladistic characters. Non
rt•tkulal.c• (•clgt•-wdghted trc('s can be summarized by their patristic mat1'ices,
28
.• '.
..
' . ' ~ r (
P == (p;j], where Pii is the sum of till' weights of all t•dgt•s 011 tht• path IH'I.Wt't'll
taxa i and j in the given trc•e. t\ trC'P whost• pat.l'istic matrix is mt nlt.ranlt't.l'ir of
height q can be reprcsrnted [.JSil, pp. -1~-- !iO] [1\l\IS<i, p. :H 2] ns a (11 + I )-h'll.t!;t.h
sequence of pairs (P;,l;) such that
2. I; is an intege1· such that 0 =It < /2 < ... < l(•t+l)'
3. Pi is a ]H'opcr refinement. of Pi+ I ( l :S i :S q), ;uul
For example, the partition representatiou of t.ht• ult.rautPI.ric t.n•t• of !td~lt t. '' itt
Part (b) of Figure 3 is
(P11lt) = ( { { 1 } , { 2} , { a } , { ,, } , { r, } } , o ) ,
( Pz, /2) ( { { 1}, { 2}, { tJ}, {a,;,}}, 1;,),
( P3, l:J) ( { { 1 , 2} , { tJ } , { ; I, ;; } } , ~!)),
( P4, /<~) - ( { {I, 2, tJ}, { :J, !i}}, :iO), a11d
(Ps,/5) - ( { { 1, 2, a, t1, r,}}, Ml ).
By convention, the weight of an edgt! iu a tree rec.oust.nu:l.t!cl usiup; disndP-
character data is the sum of the weights of all dtarad<~r-st.at.e dlil.llp;<!S ( duLrrtdt:r-
state transitions) between the vertices clefiuirtg that t!clge (st~e Figun~ tJ ).
Each approach to phylogenetic aualysis consiclemcl iu this t.lu!sis ernlwrlit•s a
r.riterion that assigns a cost to each possihh! twe rdativt! t.o a part.it:ular t!at.a
29
11101
~ 101 OJ 11000
~ 1 2 3 !)
10100- 10000-
;; II 11100 00000
) 10011
1000~ -
~ 11001
Figur<! 4: A discret.l~ character trC'e. This umooted tree is based on 5 unweighted binary rharad.crs, and has a length of 9. The number on an edge denotes the charad<•r wltosc stat<~ has changed on that edge. Note that there arc multiple dmrad.(•r-st.ate trausitions iu characters 2 and 5.
set.. Tlu~ t.re(~s sdec:ted hy each approach a::; the best estimates of the actual
(Wolutionary t.reP for a. data set are the trees whose cost is optimal for that
dat.a sd uucl<'t' I. hat approach's criterion. Hence, each approach to phylogenetic
analysis is au optimization problem.
Sewral of the most popular approaches to phylogenetic analysis that use
disrn•l.t• charar.ter data arc:
• Phylogenetic Parsimony [Hen66, I<F69): Selects the evolutionary tree
of shortest length that reproduces the character distributions for the given
taxa, wiH're the lrnglh of a tree is the sum of the weights of all edges in the
tr<'P. The hypothesis encoded in this tree is preferred because it explains as
much of t.h(• ohs(•rved character distributions as possible by character-state
t.rausit.ions in a common ancestor, and invokes the fewest ad hoc hypotheses
of suhst•qtwnt character-state change [Far83}.
30
0
c.
A :3
B 3
c 2
D 2
E
F
0
(a)
0
(h)
c2 2
2
I
0
0
0
0
Ca
:J
a 2
I
2
()
Figure 5: Character-state trees (adapted from [Day88]). Part (a) Hhows t.llre(! character-state trees Ott C2 , and C3 • Part (b) shows tlu! t.r(!es of 1m l,sds corr(!· sponding to each of these characters as determined by discrete dumu:ter IIIILf.ri x X on the set oftaxa 8::: {A,B,C, D,B,F}.
:JI
Character-State Character ( !riteriou Transition Restrictions Order Type
Wagner WL No restrictions. Linear ( Lillf!ar) Wagner "VG No rcstricLions. Ordered ( Cmwral) Fitch Fi No restrictions. Complete (!ami 11 -Sokal cs No transitions from derived to an- Ordered
cestral states. Dollo Do One transition from ancestral to de- Linear
rived state per character. Chromosonw CI One transition from ancestral to Linear I nv<·rs ion heterozygous state per character; ( l'olyrnor·phism) no transitions from ancestral to de-
rived or from derived to ancestral or heterozygous states.
( :1•tu•r a I i:wd Ge Specified for each character. Ordered
Table 1: Phylogenetic parsimony criteria.
Tlwn• an~ several phylogenetic parsimony criteria, each of which encodes a
different model of evolution by placing different restrictions on the types
and 1111111hers of character-state transitions allowable in a tree (see Table
l ). The Wagner Linear {KF69], Waguer General, and Fitch {Fit71] criteria
nssurne the simplest model of evolution, in which character-state change is
rcvcrsihl<•. The Camin-Sokal criterion [CS65] assumes that character change
i.s irreversible, while the DoJio criterion [Far77] assumes that character-state
rhange is reversible but character-state origin is unique. The Chromo-
sonw lm'<'rsion criterion [F'ar78] is a restricted Dollo criterion whose char·-
adt>rs have three statrs: ancestral (A), derived D), and heterozygous (H
= {A, D}). Tlw Gl'neralizl'd parsimony criterion [SC83] represents charac-
32
ters as matrices of distances b<•twcen r.hnntder states (.o.:/rpmalricr .. ~). which
allows this criterion to simulate all possihl<• parsimun~· nit.<•l'ia by plal'inp;
appropriate restrictions on tht• st.at.<•-transit.ioll W<·ip;hts [SO!HI, FigHt'<' II,
p. 464}.
Note that in the biological lit.eratur<', phylop;<'tl!'l.ic parsimony lll!'l.hocls an•
also called cladistic parsimony or cladistic llll't.hods, and that. t.lu• l.t'l'lll
"phylogenetic systematics" is on occasion I'Psl.ricl.<•d l.o t it<• i 11 fPrt•ttc<' of t•Vo·
lutionary trees by phylog<•netic parsimony nwthods.
• Character Compatibility [M EHr>]: H<•consl.t'llds tlw l'Volu t. iollii.I'Y l.n·<·
from the largest subset of the given dtamc.t.ers that. at·<~ pairwis<' comtmt.ihl<',
where two cladistic character:; /\'and /,an~ r.omp<tt.iblt• if t.ltt•n• I'Xist.s a t.n·<·
of subsets M such that the trees of subsds K ami L of l.ht•sp charad.<~rs arc•
subsets of M. F'or example, in Figure;,, dw.rad<·r~ (.'1 ;uul C''J. ;u·p com Jml.il•l<•
and c2 and c3 are compatible, hut. c, and e:, ii.J'(~ nol. t'Oillpat.ibh•.
• Maximum Likelihood [Fd8lj: Sdeds tlw twolut.iourtt·y t.rc•e thai. hm; l.lu•
greatest probability of producing t.he frequencies of c~ach t.ypc~ of dt;u·arl.l'l'-
pattern in the data, i.e. t.lw maximum likdiltoo.J P(Citamdf:/'.'1 I 'J'r,.,:),
relative to some probabilistic modd of charader-:;t,at.e chauge.
• Invariants (Evolutionary Parsimony) [CFH7, Lak~7]: Sd,~d t.IH~ < ~volu -
tiouary tree that best. satisfies its as!;ociated i1um1·ia.nls, whidt an~ a.lgdmtic
constraints on the ohsf!rved frequencies of each type of character-pattern
t.bat !told for that tree over all possible discrete character matrices. The set
of invariauts for f!ach evolutionary tree is derived relative to some proba-
f,ilistic: model of character-state~ change.
ThPrt! a.re many approaches to phylogenetic analysis using distance matrix
dat.a, all of which assume tltat tiH~ given dist.auces represent or closely approximate
ad,ua.l c~volut.iouary distances between taxa. Most of these approaches compute
tlu! ul1.mm<'trir or tree metric corrcspouding to the tree that has the minimal
clist.arlct• from the given semimctric according to some statistic. Many of these
st.at,istks are based on the Minl<owski metrics Lq, q ;::: 1, defined on pairs of
llmt.rict!S /) and P on taxa S.
f.~q{D, P) = { 2: IDxy- Pxylq}l/q J~,ves
Loo((l, p) = ma~ IDxy- Pxyl :r,yE~
Sl'Vt'l'a.l snell statistics for scmimctric D and ultrametric or tree metric P are
atul
P,t(/J, P) = L IDxy- Pxyr [aE{l,2}] x,yes
F(D, P) = lOO X Lx,yeS IDxy- Pxyl Lr.,yES Dxy
(1)
(2)
(3)
(4)
wht•n• F1 il' t.ht• .f-st.ntistic [F'ar72], P2 is the least-squares fit criterion [CE67],
mul F wm1 d<•linl'd it1 [P\V76]. Note that each statistic in both of the groups
34
{ L11 Ft, F} and { L2 , F2} is arithm<•t.irally t•quivah·nt. t.o ot.ht•r mt•mht'I'S of i t.s
group. One can also vil'W till' gin•n dist.anct•s not as t.argc•t.s t.o llt' approximat.t•d
but as lower bounds on what should he apprm:ima.l.l'd. This is t•mhodit•d i 11
the concept of dominance, i.e. for ml'trics /) atul /J' on a st•t. of oh.krt.s ,1.,', /)
dominates D'(D ~ D') if'v';z:,y E 8, /J.J'Y 2:: D~.11 [.JS71, p. rl::!]. Though dominann•
was originally proposed for fitting ultramdrir l.r<"l'S, it, has also 111'1'11 wwd in Holllc'
methods for fitting additive t.n•es [S090, p. 'tf)l] .
Each approach to phylogenetic analysis <~mhodic•s SOilll' lliOdt'l or t.lu• c•volu
tionary process of character change; some art' 111ol'e explicit. t.ha.u ot.IH'rs in t.ht•
statement of the model that they use. The t.n•c•s pmclucc•d hy C'ach approarh a.n•
useful to the extent that one believes in the model c•mhodit•cl hy t.lmt. approa.C"b.
See [Fel88, PI-1592, 8090] for a complc~t.e review of appl'oad~t•s t.o phylo,!!pnot,j('
analysis and computer programs implementing thc•st• appma.dl!'s.
3.2 NP-Complete Problems in Phylogenetic Systematics
Since 1982, decision problems for t.hc maJor phylogcmet.ir: parsimouy nit1•ria
[Day83, DJS86, DS87, GF82], t.he character compat.ihilit.y nitPI'iou (I>SHfiJ, aud
various distance matrix fitting criteria fo1· ultranwtrit: awl ;ulclit.ive l.rec!H [l>ayX:J,
Day87, Kri86, Kri88, KM86] have been shown NP-complete using rc!dur:t.ions fro111
the NP-complete problems given in Tahlc 2. As later sedious of t.his tlwsis will
make extensive use of both these dcfiuitious aucl these redudicms, t.lu~y will be~
VEUTEX CoVEll (VC) (GTI]
Instance: A graph G = (V, B) ami a positive integer /( :51VI·
Question: Is t.IH!re a vcrlcx r.ovc1·of size I< or less for G, that is, a subset V' ~ V such that IV' I < /( and, for each edge { u, v} E E, at least one of u or v lwlougs to V''!
EXACT CoVEll BY 3-SETS (X3C) (SP2]
Instance: 1\ set X with lXI = :tq and a collection C of a-element subsets of X.
Question: Dol's C contaiu an rxacl cotJcr for X, that is, a subcollection C' ~ C surh t.hat ew!ry clement of X occurs in exactly one member of C'?
CLIQIII~ [GT19]
Instance: 1\ graph G = (V, E) ami a positive integer J :5IVI·
Question: Do('S C: contain a cliqur of size J or more, that is, a subset V' ~ V surh that IV'I > ./and every two vertices in V' are joined by an edge in E?
Tahh•2: Basic NP-complcteclccision problems (taken from [G.J79]). The reference tmmlwrs m.;sigtu.•ll to these problems in the list of NP-complcte decision problems in [G.J7!l] arc given in squan~ hrackcts.
n•viewt'd in this section. Each reduction will be given a formal definition in the
sl.yh• of [l\a.r72], followed hy a sketch of its proof of correctness.
3.2.1 Phylogenetic Parsimony
t•:arh of th<•st.' prohiPms is given as input a discrete character matrix for m taxa
aud •I rharacters, and operates on an implicit graph G whose vertices are the
sd. of all d-dinwnsiona.l points defined by the statrs of the given characters and
36
........................ ~ .. ---~··-···· .. -·- ......... ........ _ .... )O'In.J-·1\..,....__, _______ ......,_ .. ~----- ----·-.. -· ---·-.. -
whose edges are specified by the allowahlt• transitions lwt.W<'<'Il t.h<' st.atl's in t.ht·~"'
characters. Each phylogcnetic parsimony problem st•t'ks tlw l'\'olut.ionary tn•p iu
G of minimum length that includes the given taxa, suhjcrt t.o t.lw n·st.ricl.ions
on character-state transitions that arc particular to t.lmt. prohlc•Jit 's nit.c·riott (st•t•
Table 1 ). The given r.hamcters can IH• r<•strict<•d itt various ways to p;PtH'rat.<• a
family of phylogenetic parsimony prohll'm "sdwmata.'' (st'l' Tahh•s :t, ·I, and l'\);
each phylogenetic parsimony r.rit<.·r·ion rau t.h<'ll lw applit•d t.o t.ltt'H<' sdwmat.a. t.o
generate problems. The hierarchy of sllhprobleltts gPIIPral.t•d hy t.ltt•st' scht•nmt.a.
will be useful in lat.er sections of this tlu•sis.
Consider the following restrictions 011 the gi wn rha.mct.<•rs:
• Cladistic vs. Ordered vs. Qualitative: A rl(u/i.'>lit· (C) JII'O"It•llt is
given polarized characters, an ordered ( 0) prohlc~111 is giv<~ll orcl<•n•cl dmt·
acters, and a qualitative ( Q) is gi veu 1111onlt~ml dt amrl.<•rs. Eitch prul,l<!lll
finds solutions that arc cousistcnt wit.h its clmra.ders; how<~V~~r·, cmi<~J'('<I and
qualitative problems must also find charader polari:ml.iolls aucl oniNillgs
for which solutions exist.. Thn dadist.ic / qmtlit.at.ivc ~ dist.indiun was IIUI.d<~
in [D.JS86, EM77, EM80] for binary characl.ers; as qualit.al.iv<~ arul cmkrcd
problems are equivalent for biuary cltaract.ers, t.he dist.iudiou of onl<~ri11g is
only applicable to unconstrained characters.
Cladistic problems correspond to phylogenetic; aualysiH pror:r!dllrPs t.hat. pro
duce explicitly rooted trees, orrlc~red problems c:om~sporul t.o proc:edurc~s
t.llat. produce eitner rooted or unrootcd trees, and qualitative problems cor
resrond t.o procedures :::· w1• as Transformation Series Analysis [Mic82] that
simult.ancously produce trees and derive character ordering and polarization
from the given data (cf. [Lip92]).
• Binary vs. Unconstrained: A problem is binary (B) if it is restricted to
hirw.r·y charactct·s; otherwise, the problem is unconstmined (U).
• Weighted vs. Unweighted: A problem is unwcighted (U) if it is rc
st.rir.t.ed t.o unweight.cd characters; otherwise, the problem is weighted (W).
'I'll<' live sr.lwmal.agcneratcd by the first two of these restrictions arc given in Ta
hles :J aud 'l; t.hc remaining restriction yields a total of ten schemata. The validity
of t.hese restrictions for each of the phylogenetic parsimony criteria is shown in
Ta.hle !l. Hestrktions do not. apply to a particular criterion i~ they conflict with
t.lw rest.rirt.ions imposed by that criterion e.g. Dollo criterion characters can only
haw three st.atcs; Pitch criterion characters are by definition unweighted and or
d<•rcd. The application of all phylogenetic parsimony criteria to valid schemata
yields :m phylogenetic parsimony problems {see Tables 6 and 7).
Additional phylogenetic parsimony problems may be generated by allowing
1'\'olut.ionary tn~<.·s to include limited amounts of reticulation. Consider the prob
IPm sclwmata in Table 8 defined for each uon-reticulate phylogenetic parsimony
pwhl('lll X. TheHc schemata restrict the amounts of available (the SHX and SRX
:-;cltl'llHt.la) or allo\'.·able (thl' ~~-HX and k-RX schemata) reticulation that can oc-
38
BINARY CJ,ADISTIC X (BCX)
Instance: Positive integer d; a subset ,c.,• of {0, I }d ; and a po11it.iw inl.c•p;c•r IJ.
Question: Is there a phylogeny satisfying r.l'it.erion X t.lwt. iuchulc•s .'i, is rooi.C'd at the root-type vertex, and has length at. most. /.J'?
BINARY QUALITATIVE X (BQX)
Instance: Same as BCX, exc.ept that 110 rhara.d.er is n·quired to lw clin·c-1.1'cl.
Question: Is there a phylogeny satisfyiug criterion X t.hat. iuc·.luclc•s S iiiHI lm:-~ length at most B?
Table 3: Phylogenetic parsimony decision prohlc~rn sdu~nmta (uotJ·I'c~t.ic-ulatc• trees) (adapted from [D.JS86]). Tht~se schernat.a are sf.a.t.c•d rc ~lativc~ t.o a. phylogenetic parsimony criterion X. If X E { Cl, CS}, root-type is "all-atJc'c~stral";
if X = Do, root-type is "all-derived".
Note that the statements of problems given ahove difrcr from (D.JSHli, I>SH7j iu that the bound B is on the number of edges rather t.hau t.lu ~ 1111111her of w~rt.ir«'s
in the tree. The two formulatious arc cquivaleu1,; ltowevc•r, t.lw for·uu~r a.llows a more natural interpretation of weighted problems.
lJNCONSTIIAJNJm CLADISTIC X (UCX)
Instnnce: Pusitiw iut.eger d; sets At, ... , Ad of character-states, and dit·cctcd dmrcu:ter-statt~ graphs G1, •• • , Or1 specifying allowable transitions among tlws<· sta.t<·s; a. subset 8 of A1 x ... x Ad; and a positive integer B.
Question: Is tlu·rc~ a phylogeny satisfying criterion X and the given directed cha.rad.Pr-state graphs that iuclttdes 5', is rooted at the root-type vertex, and has h•ngt.h at most. 13'!
lJNCONSTIIAINEI> OIIDEHEI> X {UOX)
Instance: Sanw as tJCX, except that none of the character-state graphs are din·d<·d.
Question: It.: ther<' somP polarization of the given character-state graphs that allows a phylo~cuy satisfying criterion X that includes 8 and has length at most 11'!
UN<:ONSTHAINI:D QUALITATIVE X (UQX)
Instance: Sanw as UCX, <'XCl~pt that none of the character-state graphs are ord<'rt'd.
Question: Is t.hPn' som<' ordering and polarization of the given ch~racter-state gr<~phs that allows a phylogeny satisfying criterion X that includes 8 and has l<•ngt.h at most /3'?
Tahh• ·1: Phylog<•nt'Lir parsimony decision problem schemata (non-reticulate II'<'<'S) (cout.'d fmm Tabh• :3}.
...
·,,'~ \ .,
:: j
(
.,
.· ... '
I
·~
Cladistic I u n·,·eight.ed I Binary I 0 rdt'l't'd I #
Criterion Weighted lJ IH'OJISI.ra i JLt•d Qualit.at.i\'t' Pwh. Wagner Linear WL J J O,Q s Wagner General WG J J O,Q H Fitch Fi J 0 ~
Camin-Sokal cs J J C,O,Q I~
Dollo Do J J C,O,Q I~
Chromosome Cl C,O,Q a Inversion
~
Generalized Ge J J 0 ·I Total :HI
Table 5: Applicability of input charactt•t· rc>stridions to phylogt'llt'tic pa.rsituouy criteria. The given total tlltmbrr of prohlc>ms is small•·•· t.hau <'XPI'f'l.t•d IIC'raus•· some of these problems are cquival<'nt; HI'(' Table•:; (i and 7 for dC'1.1tik
cur in a tree. Sec Appendix A for furtlwr di:;cussion of tltc•s•· sdu·mal.a. Eal'h l'a.n
be applied to all phylogenetic parsimony prohh•ms C'l'<'iii.Pd so far, p;iviup; a. tot.;d
of 156 phylogenetic parsimony problems. One such prohlt•m i:; k-ltl JtiOWL, t.hc·
k-Recombination under Unwcightcd Unconstraiul'd Ordered W<tp;tu•r Litu•ar p;tr-
simony problem. Note that as ret.iculat.ion is always dirc!f'l.<'d , t.hl! l.rt•c•s prodw·"d
by these problems arc rooted.
It is not obvious at first glance that the prohlt!lllS allow an! in NP. CoiiVt!ll -
tiona! tree-traversal algorithms can be modified to dw1:k all parsi111o11y nit.1•ria
for both non-reticulate and reticulate tn~t·s i11 tiuw poly11o111ial i11 tlw sill<! of tlw
candidate solution [StaT80, Scctiou a], hut. sur.h solut.io11s arP uot guaraul.<!t•d l.<1
be of size polynomial in the size of the insl.!• 1lC!' lwcaust! tlwy 111ight. lw as larg<! as
the implicit graph on 0(2'1) vertic.es from wh ir:h t.lu~y an~ t akPII. JloW«!Vt!l', uud• ·r
41
Ac:ronym Problem
IJBW Unwcighted Binary Wagner IJBOWL Unweight.cd Binary Ordered Wagner Linear lfBQWL Ouweightcd Binary Qualitative Wagner Lineai· IJBOWG lJuwdghted Binary Ordered Wagner General IJBQWG Unwcightcd Binary Qualitative Wagner General BFi Binary Fitch
HIJW Unweighted Unconstrained Wagner UUQWC: Unwdghted Unconstrained Qualitative Wagner
General UF'i Unconstrained Fitch
WBW Weighted Binary Wagner WBOWL Weighted Binary Ordered Wagner Linear WBQWL Weighted Binary Qualitative Wagner Linear· WBOWG Weighted Binary Ordered Wagner General WBQWG Weighted Binary Qualitative Wagner General
UlJOWL llnweighted Unconstrained Ordered Wagner Linear lJIJQWL. Unweighted Unconstrained Qualitative Wagner
Li ncar W\JOWL Weighted Unconstrained Ordered Wagner Linear \tVUQWL Weighted Unconstrained Qualitative Wagner Linear UlJOWG Uuweighted Unconstrained Ordered Wagner General WlJOWG Weighted Unconstrained Ordered Wagner General WlJQWC: Weighted Unconstrained Qualitative Wagner General
Tahlt• (): Phylog<•net.ir parsimony dt•rision problems (non-reticulate trees). Each p;ruup of <·quivalt>nt. probll'ms is indented, and appears after the acronym for that p;rou p.
Acrouym Problem
tJ BCCS Unweighted Binary Cladistic Camin-Sokal tJBQCS Uuwcighted Binary Qualitative Camin-Sokal
tJBOCS tJnwcighted Binary Ordered Camin-Sokal tiBQCS Unwcightcd Binary Qualitative Camin-Sokal
(J uccs U nweightcd Unconstrained Cladistic Camin-Sokal llliOCS Unweighted Unconstrained Ordered Camin-Sokal liUQCS Unweighted Unconstrained Qualitative Camin-Sokal WBCCS Weighted Binary Cladistic Camin-Sokal WBOCS Weighted Binary Ordered Camin-Sokal WBQCS Weighted Binary Qualitative Camin-Sokal wtrccs Weighted Unconstrained Cladistic Camin-Sokal WtiOCS Weighted Unconstrained Ordered Camin-Sokal WUQCS Weighted Unconstrained Qualitative Camin-Sokal U BCDo Unweighted Binary Cladistic Dollo trBQDo Unweighted Rinary Qualitative Dollo
lJBODo Unwcightcd Binary Ordered Dollo UBQDo Unweighted Binary Qualitative Dollo
llllCDo Unwcightcd Unconstrained Cladistic Dollo lJ IJODo Unweighted Unconstrained Ordered Dol1o tJ\IQDo Unweighted Unconstrained Qualitative Dollo WBC:Do Weighted Binary Cladistic Dollo WBODo Weighted Binary Ordered Dollo WBQDo Weighted Binary Qualitative Dollo WtJCDo Weighted Unconstrained Cladistic Dollo WUODo Weighted Unconstrained Ordered Dollo WUQDo Weighted Unconstrained Qualitative Dollo
CCI Cladistic Chromosome Inversion OCI Ordered Chromosome Inversion QCI Qualitative Chromosome Inversion \IBGP Unweightcd Binary Generalized lJlJGt• Unwcight£>d Unconstrained Generalized \VBGP Wcightt'd Binary Generalized WllC:t• WP.ighted Unconstrained Generalized
Tahlt• i: PhylogPIIL'lir parsimouy dt•dsion problems (non-reticulate trees) (cout'd from Tahlt• (i).
.,, ·.
\· i I
SELECT HYBRIDIZATION UNDEH X (SHX)
Instance: San1c as for problcn1 X, \Vit,h au f\(ldlthluetl par;t.JHPtt'r li~ a givc'll polynomial-sized (in the parameters of X) sd. of a-hypi'J'Cir<'S.
Question: Same as for X, with the additional condition t.ha1. t.ht• phylu~t·ny t'illl include any subset of the :3-hyperarrs in H.
k-HYBRIDIZATION UNDtm X (k-HX)
Instance: Same as for problem X, exc<•pt that. t.lw implicit. graph illc'oJ'IHH'H.I.Ps a fixed type of 3-hypcrarc, and there is a.11 additional pat·a.nwt.c•J' ~·,a positivi' integer.
Question: Same as for X, with the additional condition that. tlw phylop;I'IIY C"llll
include~~~ 3-hypcrarcs of t.hc fix<'d t.ypt'.
SELECT RECOMBINATION UNDER X (SR.X)
Instance: Same as for problem X, with an a.cldit.i01ml para.nlC't.c•r U., a l!,iwn polynomial-sized (in the parameters of X) sd of 1-hyJu•nu·•·s.
Question: Same as for X, with the additiomd condition that tlw phylop;t'II.Y l 'illl
include any subset of the 4-hyperarrs in II..
k-RECOMBINATION UNDEH. X (k-RX)
Instance: Same as for problem X, except. that. t.lw implidt. gmph irll'orJiorat.c•s H.
fixed type of 4-hypcrarc, and there is an additional par·;uneiN k, a. positiVI' integer.
Question: Same as for X, with the additional c:owlit.io11 that. t.lw phyloppny ..au
iuclude ~ k 4-hypcrarcs of the fixed type.
Table 8: Phylogenetic parsimony (b:isicm prohlc:m sdu:11mt.a (rdk•tlat.t: t.rc•!•s) . These schemata arc stated rcl1\tive to a 11011-r<:t.kulat.c: phylogc:ndic parsi111o11y problem X.
n~rtaiu additional restrictions, the problems defiued above can be :ohown to be in
NP. Cousider the relationship hetW<'Cil solution cost and size.
Lemma 3 A polyuominl-liuw rwndr.tr.nninistic computation is guaranteed to find
all solutions V lo an in:;lancr: I of an unweighted (weighted) JHu·simony problem
X .o;w·h thai bx(}/) :5 p(j/1) (bx(Y) $ ]J(I/I)W~niu(/)) for some polynomial p.
Proof: Ohsc~rvc• that the l<u-g<•st solution of cost k to an iustance lunweighted
parsi111ony prohle111 is a lr<'c 011 1.: + I vertices, and that the largest solution of
l'ost. 1.: t.u au instance I of a weighted parsimony problem is a tree on k/Hlmiu(/)
wrl.kc•s with edgc•s of Wc~ight l.Ymin(/), where Wmin(/) (Wmax(/)) is the smallest
( largc•st.) rhat·actc•r- transi tio11 weight i 11 the given instance. I
Solutions sat.isfyi ug t.IJ('sc> hounds exist for each non-reticulate pity logenetic par
si!lwtty prohlc•m dl'finc~d ahovc. For Wagner Linear, Wagner Genera], Fitch,
Camin-Sokal, atul C:c•ut>ralizc•d problems, this solution is a tree rooted at the
;d)-ann•st.ral vc•rt.ex which has paths to each taxon that usc the appropriate
dtat'iu'tl'r-sl.atc• transitions to gcnc>ratc the states for that taxon; the solution for
t.IH' l>ollo (Chromosonw luVl'rsion) problem is a tree rooted at the all-ancestral
(all-A) Vt'l'l.l'X that. has a. path t.o th(• all-deriVl'd (all-H) vertex, which then has
paths to l'ach taxon that use the appropriate "reversal" transitions to generate
t.llt' st.ai.Ps for t.hat. taxon. Each of these solutions is of size O(mdl (log Wmax))
ancl cost. O(uul/(lr .. lllx)). whc•rc> 1 is t.he IC'ngth of the longest path between two
stalt•s iu any dtal'<lrlt'r-stat.c• graph in the instance and H11114x = I if the problem
·15
---vl"'""',..-""-.,~•,., ••,.. .. .....,.._,.... ..... .._,_..., • .,,__- .o. •<~ ••r . .... .. -.--~-·•·._.,..-~__....,.,,....,..........,,.....-....,.....,~""'*........,......'_.........,
is unweighted. As the cost of a solution i!' proportiunal to it.s si~l' in HI\Wl'i~hh·d
problems, any solutions (including optimal solutions) lwt.h•r t.l1at1 t.hosl' ~in•11
above have costs t.hat satisfy thP bound in Ll'lllllHl :t
Corollary 4 A polynomial-lime nondrlrrminislit· comJnllnlion i.o; .f!lltll'tllllt't'd lo
find all O]Jfimal .<~olutions of any in sf anu of tl 1/lm-rd iculal r 1111Utri!1h fnl l'h !flow
n die parsimony Jn"oblr.m.
This relationship dor.s not hold for· w<'ight.t•d problt•ms; solutions of lowl'r cost.
may exist that arc larger than solutions of hip;ht•r cost .. llowt•vt•r, if t.ht• prohlt•m
is restricted to those instances I such that. wlllllX(/) < l'(l/1)11'111in(/) for HOJilt'
polyrwmial p, any solutions (including optimal solutiot1s) lu•t.t.t•r than t. host• p;iwn
above must have cost k ~ O(nulLWnmx) ~ p'(I/I)W111ax(/) ~ ]l'(I/I)Wmin(/) for
some polynomials p',7l', and thus haw cost.s that satisfy t.hP lmuucl in Lt'111llta :t
Corollary 5 A polynomial-time norulclr.rminislif' l'ompulalitm i.<~ .'/IUll'tWII'f'rl /o
find all opti1.tal solutions of any inslant·r. I of tt rwn-rrlirulalt· mr·i!Jhlnl phy/o!Jr·
nctic parsimony problem such /hal W11111x(/) :5 1'( Ill) W111i11 ( I) fm· HOUil' polyrwmirtl
p.
Thus, all uou-rcticulate unwcighted aud wdghterl phylog1~11d.ic parsi111ouy pro!.
lcrns defined above whose weights are so restricted are iu NP. /\s solut.ious t.o
cladistic problems are also solutions to ordert!d and qualit.at.iw~ probi1 ~1J1s, aud <ts
each reticulate problem cau iucorporate a uumlwr of retkulat.iuus at most. pulyuo-
rnial iu tire paramct{!rs of its uon-rcticulate counterpart, all parsimony problems
ddiued above an! in NP.
Tlu! reductions given in [D.JS86, DS87] that establish the NP-hardness of the
rmu-rdinllat.e uuwdghtcd binary Camin-Sokal, Dollo, and Chromosome Inversion
pltylogf!llt'l.i<: parsimouy problems are given in Tables 9 and 10. These reductions
us1! t.lf(' bask id{!a of Karp's reduction from EXACT COVER to STEINER TREE
IN GH.APIIS {h~trTl] ~namely, reduce some problem involving the selection of a
snhcollt•dion of a coll<!ct.ioll of subsets on a set of items onto a three-level tree in
wltidt tlu! lt•a.Vt!S ronespcmd to the items, the root to the selected subcollection,
and t.he t·c•ntrtining iuternal vertices to the subsets in this subcollection.2 In the
I'PIItwt.ions in Tahlt!s nand 10, the items arc the edges of G and the subsets in the
culh•rt.iou arc! t.he sds of edges adjacent to each of the vert. ices in G. The trees that
<Ln' solutions in ('ach of the reduced instances contain subtrees that have three
h•v<'ls ([D.JSHn, Lt•mma I); [DS87, Lemma 2]), where the internal vertices selected
ou t.ht• st'<'tllld lt•wl of each t.rec correspond to satisfying vertex covers for the
ul'iginal illsl.attct•s. lu t.hc case of the Dollo and Chromosome Inversion problems,
t•adt solution t.rc·t~ has a ''tail" composed of the vertices in Y which ensures that
t.ht• l.rt'l' has a. root that is consistent with its problem's criterion. Moreover, one
fan construct trl't's from satisfying vertex covers that correspond to satisfying
2TIIt' rt•durt.ion Ill' giv£'11 in [Kari2J is flawt'd, ns the reader can verify for items T = {11,/,, ,., rl, r, J,u} anti rollt~rtion of suh~>t.'t.s S = {{a, b,c}. {c,d,c}, {c,/,g} }; the edges of weight ll allow a llolut.iou t.n•t• of lt•ngth a t.o the r<'dllrC'd instance, even though the original instance has nn t'Xat'l rowr. 1\n•ntl'l has fixed this problem by a variant on [Kar72] that yields a reduction from SET <'OVEH to STEINEH TREE IN GRAPHS [GKR92, Tlworem 3.4].
4i
i f r. ,.
' '· ! '
r· ·,
; )
;
' •
,. " -~ ~·
~ ·-·· - ~ ·--~ - . -.... ~.-.,... .. .......... ~...--., ....... ____ ~ ... ~- ··4~---"· -··~ .... ""-.-... ............. ............. ,_, ...... -~flo.~wt).O'j ... ?_,.. ........ "' ... ----·-"""""""'
VC sf,l UBCCS I l!BQCS [D.JS8fl}
d = lVI, where each charar.t.<•r COtT<'SJHHlds t.o a particular wrl.t•x I' E v.
S=OUX,
where 0 is the all-ann•stral wrt.('x, and X is t.hc• :wt. of wrt.in•s corresponding to the edges in E (for r. = {1t,11} iu 8, I.Ju•t·c• are l's in the clmract.ers c.orrespondiug t.o It aud '' and U's elsewhere).
B = K +lEI
VC s;,l UBC/Jo I UBQDo (adapted fmm {D.ISHrl})
d=3IVI, where characters 21 VI+ l t.o d coi'I'('SJHltHI to t.lu! wrl.ic·c•s iu v.
S=OUXUY,
where 0 is the all-ancestml wrt.ex, X is t.lu~ sc!l. of Vc!rl.ic·c•s corresponding to the edges in E, and Y is the! S('f. of Vc!t'tin•s 1Ji, 1 S i S d, such that 1/i has I 's in dt•u·adc!I'S I t.o i ami O's elsewhere.
Table 9: Reductions for phylogenetic parsimony dc!r:ision prohlc•ms.
Note that the reductions given for the Dollo and Chromosome! luwrsiou pruhh•111s differ from [D.JS86, DS87] in that the d = :JIVI instead of '2/\ + lVI· Tlw proofs given in [DJS86, DS87] still work for these moclifi(!d recluc:t.ious; lltorc!over, tllC'st! modifications simplify the transformation of these mauy·otu! re!dur:tious f.o lltdrit: reductions in Section 4.2.
48
VC =::;~;, fJC()f / UQCI (atlapled from {DS87})
t1 = :JIVI, wlu~re r.haraders 2jVI + 1 t.o rl correspond to the vertices in v.
,c.,• = II U X U Y,
where II is the all-11 vertex, X is the set of vertices corresporuliug to the edges in E (for e = { u, v} E E, there are D's in t.he chara.dcrs corresponding to u and v and H's clsewhPre), and Y is the set of vertices JJi, I $ i =::; d, such that !Ji has A's in r.ha.rncl.erH 1 to i and ll's elsewhere.
n = 1\ + :JIVI +lEI
Tahl<~ I 0: He<ludious for phylogenetic parsimony decision problems ( cont 'd from Tahle !) ).
t.n·c•s for t.ht• rc·duced instances ([D.JS86, Theorems 2 and 3]; [DS87, Theorem
:~]). Tlw tr<'es will have the tlucc-level structure as long as the all-ancestral
(or a.ll-11) wrt.PX is included in 8; hence, these reductions simultaneously show
that. t.ht• daclist.ic and qualitative versions of each problem are NP-hard. For the
sanw n•a.son, the reduction for the Camin-Sokal problems also shows that the
1t' .wt•ight.t•d hinary Wagner problem is NP-hard [DJS86, p. 41].
The• noJI-I't'lirulate binary unwcighted problems are restrictions of all other
non-rt•l.inalat.e and rdiculat.e problems (set k = 0 (k-HX,k-RX) and R = 0
(SIIX,SHX)); thus, all Wagner, Fitch, Camin-Sokal, Dollo, and Chromosome
ln\'t'rl'ion pmhlems are NP-hard. As any ordered problem can be solved by an
appropriatc.•ly strurturPd instance of the Generalized parsimony problem, all Gen-
t•rctlizc•tl parsimony prohlc.•ms arC' also NP-hard. Hence, all phylogenetic parsimony
problems considered above are NP-complct<·.
A proof that UBW and UUW <m.• NP-romplc•t.c• was ~iwu prt'\'ious t.o t.hat.
m [D.JS86J by Graham and Foulds [GFH2J, using a n•cluct.iou from X:tC. Th"
elegant reduction from UUW to WlJOWL giV<'Il ill [!lily~:~] dO<•s uot. work as
stated there, because Day uses a version of UFi t.hat. itl<'httks tilt' implicit. p;r11ph
in the instance and this versiou has uot IH'c'll shown t.o hP NP-c·ompi«'I.P (sc'l'
Appendix B). However, with slight. modificat.ions, this n•duct.iott <hH's work I'UI'
UUW as defined above.
The phylogenetic parsimony problems described a.lmw an• rlo~wly n•litl.t•cl t.o
the STEINER TREE IN GRAPHS (STG) and RECTILJNEJ\H STI•:INEH THEE
(RST) problems (sec Table 11 ). The phylogetll't.ir. parsimony prohh~ms 111'<' like•
STG in that the solution is drawn frotn a gr·aph, aud like H.ST in t.lw.t. this solut.iou
domain is implicit. The relationship is not exact. in <~it.her ntsP lu~causc~ nurw of
the phylogenetic parsimony problems den ned ahove iuc.ludc~ thdr· implic~il. gmphH
in their instances (cf. Appendix B), and ouly tlw si111pl<•st pllylop;c•tl('tk parsi
mony problems are defined on d-dimensional rc~dilirwa.r spaC'c~s. Dc~spit.l' tlu•sc~
differences, certain of the STG and RST solution ;uul approximatiou algol'it.luns
[BR91, Ric8D, Sny92, Win87] can he modified lo solve particular pl.ylogmu~tiC'.
parsimony problemsj see Section 5.4 and Appendix B.
STEINEB TJIEI~ IN GRAPHS (STG) (ND12]
Instance: Graph G' = ( V, E), subset .5' ~ V, positive integer /( ::; lVI- 1.
Question: Is there a Steiner tree T for S in G with length ::; /(, that is, a subt.re(~ '/'of G that includes all vertices inS and contains no more than /( l'dg(~S'!
R.E<:TJLINEAII. STEINER TRim (RST) [ND13)
Instance: Set fJ = { (a:1, Yt ), ... , (:en, Yn)} of integer co-ordinates in the plane; posit.ivP int.eger /.-.
Question: Is there a recti/wear Steiner tree T with length ~ L, that is, a tree '/' composed of horizontal and vertical line segments linking the points in P such tlmt the sum of the lengths of all line segments in that tree is no •nor<! t.han L?
Tahle II: Steiner Tree decision problems (taken from [GJ79]).
3.2.2 Character Compatibility
Each of these problems is given as input a set of d characters defined on a set of
111 obj<.•cts, and s<•eks the largest pairwise compatible subset of the given charac-
t.t>rs. Tlu! cladistic / o1·rlercd I qualitative and binary I unconstrained character
rt•st.l'kt.ions made in t.hc last section arc also applicable to character compatibil-
it.y prohlt>ms. A collection of ordered (qualitative) characters is compatible if its
<'harad<.•r-Ht.a.te sds l'an be polarized (ordered and polarized) to make the collec-
t ion a mmpal.ible set of cladistic characters [DS86, p. 225]. The five character
compat.ihilit.y problems so defined are given in Tables 12 and 13.
Earh of t.lws<.• character compatibility problems is obviously in NP, as solution
51
BINARY CLADISTIC COMPATIBILITY (BCC)
Instance: Set of m objects; a colll'ct.ion C of d hinmy rladist.k chararl.c•t·s, as described by a d-by-111 chararter-hy-ohjl•cl matrix .\; and a pusit.i\'1• int.t>p;t•r B :::; d.
Question: Does the collection of rharadt•rs C ha\'t' a. rompat.ihh• rollt•ction ( '' ~ C such that IC'I ~ B'?
BINARY QUALITATIVE COMPATIBII,ITY (BQC)
Instance: Set of m objects; a collcct.iou C of tL hi nary qualitat.ivt• dt;u·adt•t·s, as described by a d-by-m chamc:tc>r-hy-ohj('d matrix X; and a. pusit.ivt• iut.c•p;t•r B :S d.
Question: Does the collection of cha.mci.Ns C hav<• a polari:mt .o11 sudt t.lwt. there is a compatible collect.ion C' ~ C such t.ha.l. IO'I '2: In
Table 12: Character compatibility dt•c.isioll prohl<•ms (ada.pt.t'd from [I>SH(ij).
sets of characters are subsets of the giveu set. of dmmd.Ns. Til<' n·dudious p;iv<'ll
in [DS86] which establish that BCC a11d BQC are NP-Itard are givc~u i11 Tahlt• 1 ~1.
The problems CLIQUE and BCC are very similar: hot.h problt•ms an~ looking
for the subset of largest size such that a particular wlat.io11 holds ildwc•t!ll c~vc•ry
pair of elements in that subset. Let /( IH! t.hc charad.t~r-pat.l.t ~l'll for a particular·
character, and /((x) be the chamdt~r-stat.t~ in 1\ of t.axo11 ;r ; for two l.iuary
characters Ki and /(i on the set of taxa .S', 1\i and /(i aw incotnpatibh ~ if aut!
only if all three of the clements ( 1,0), (0, l ), aiiCI ( l,l) art! in (/\i X 1\'i )( .'i) [ E.J M 7fi,
Theorem 2.3). By this result, pairs of characf.ers iu t. he redun~d iusl. iLIII:I~ t.laat.
correspond tv vertices not joined by an edge in (] an~ inmmpatil,)e. llmtm, ·il lY
52
lJNC:ONSTII/dNED CLADISTIC: COMI'ATIBILITY (UCC)
Inst.ance: Colh·C'tiou (!of rl d;ulistic dta.radt·rs dC'fiucd on a s<'t of m ohjccts; a positive iuU•gpr IJ S d.
Question: Do1•s t.lw coll<·rt.ion of charact.Prs C: have a compatible collection C' ~ ( .' sur!. that. IC'I 2: /J'?
UNCONSTIIAINEI> OtWEHED COMPATIBILITY (UQC)
Inst.ance: Collc•<·tiou (,' of rl onl!•n•d r.harad.Prs d<•fined on a srt of m oiJjects; a positive• int<•g!'l' I:J :5 d.
Question: IJo<'s tlw colle-ction of charad<•rs C have a polarization such that l.lu•rc• is" coll!Jiil.l.ihl<• roll('(·tion C' C C such that IC'I 2: 13'!
UNCONSTIIAINEI> QUALITATIVE COMPATIBILITY (UQC)
Instance: ( :ollc•d.iou ( .'of d qunlitative characters defined on a sC't of m objects; a posit.ivt• inlt'g<'r /J S d.
Question: l>oc•s t.ht• colft.ction of rharact<•rs C hav(• a polarizatiou and au ordering such t.hilt I hl'l'<' is a COillJHitihlt• collt•ctiou C' ~ C such that. IC'I ~ B'l
Tahlc• l:i: Chararl.t•J' l'ompat.ihilily dc•cisiou prohl<'ms ( cont 'd from Table 12).
..
1 I
1 l t j .I
.'j
1 -• .
C:L/Ql!E :S:!:. /JC(' [lJS8fi}
d= lVI 111 = :ld(d- I )/'2 X = [.r i.i]• I :5 i 5: d. I :5 j :5 m
X has a chilractt•r-collllllll for c•ach \'<•rl.t'x in \ ·. and t.hn•t• tCtxon-row:; for c•aclt llllordt•rc•d piiir of vc•rtin•:-; in \ : . For t•aclt t>dgc• { 11, "} llOt ill /.;, St'l. t.ht• row-c•lllriPs in col II IIIli 11
for t.hat. <'clg<' to 011, and t.lh' roW-Pilt.rit•s in column,. l.u I 111. All ot.IH'r <'lltric•:-; ill X an· 0.
H=.l
BCC <'' /3(,1(,' {DSHfjl -m ..: J
d' = d 111
1 = '2111
X'= [.r';,j), I 5: i 5: d', I ::; j ~ m'
wlwr<~ tlw I.Mm corn•sJHHidiup; t.o row:-; (111 + I) ~ i S 1111
t•xhihit. t.lw anrt>HI.t·al dwrai'I.Pt'-st.;ll.f•s of t.hc• dtanu·t.c•rs in X.
13' = IJ
IJQC <1' UCC {JJS~fjl -Ill J
d' = d m' = 111
X'= [.r'i.i), I $ i :5 rl', I :5 j :5 w'
wltt•rc• a rharadc•r's 111ost. fn•qllf'llt.ly fH'c·urritlp; sl.al.«' IH'r-orrws
that chat·adPr~s tli1Cf's1.ral stat.c• in X'.
/31 = 13
TabiP 1·1: Hc·duct.ious for cltaradf•J' r·umpatiJ,iJit.y dt•cisioll proJ,Ic•rrts.
,.,,ll,.,.t.iorr of painvist· culliJHt.l.ible drcp·adc·rs must. mrr<·srwrrc.l t.o a s1 :'. of vertices
i11 (,' t.lrat. forr11 a clique• [DSH(}, Proposition ·1]. c<mlplding the proof. The key
l.o till' n·rlud.iou from BCC to BQC is that hi11ary qualitatiw characters h<'have
lik'· J,irmry cladistic dwrart.1•rs i11 which the• sl.at.1• orcurring most frequently has
lw''" sl'l. l.o illlf'l'sl.ral [MrM77, Ll'lllllHt and Tlwon•m I]. This carl IH' forced by
addilll!, !.axil [I>SHfi, Proposition ~]. Till' n•durt.io11 from BQC to BCC holds hy
sirnilar wasolliiiA [J>SXfi. Proposition 4 As hiuary charact<•rs ar<' restrictions of
unm11sl.rairu•d ,.J,arad!•rs, prohh•ms UCC. liOC, and lJQC <HI' also NP-complet.c.
a.2.3 Distance Matrix Fitting
Each of t.III'S<' prohi<'IIIS is gi\'1'11 as input. a semimt•tric 011 111 taxa. Sonw pmh
l«'rns s<'l'k ,•it.l11•r t.lw ult.ram<'tric or culdit.iv<' 1.1'!'1' that has tlw closest fit to this
S<'tllinwtric· arrording l.o that. pruhlem 's statistic; othl'rs seek the ultraml'l.ric or
addit.i\'<• t.r<'l' uf short.<•sl. h·11h; !• that. is dominant. to this seminwtric. The distance
nwt rix lil.l in~ prohlt•ms dl'lirwcl in [DayS:J, Day87, l\ri88. Kivl86) arc• gh·1•n in Ta
!.lt• I;,, ~la11y of I ht'SI' prohlt•ms Wl'l'l' shown to lw N P-romp)C'te via redudions
from <'c•rt ai11 oft l11•ir suhprohii'IIIS gi\'l'n in Tahlc~ 1(),
As wit II tlu• phylu~t'n<'l.ic parsi111ony prohll'lllS, the distance matrix fitting
pwhlt·m~ an• in ~P ~tthj1·ct to t'I'J't.aill n•st.rictions 011 iustant·e \\'('ights. For an in
sl ill II'<' I and assw·iat I'd solttliull L lt't W,w.rdiJ/(1, br) is th1• maxi mum difrerence
l11•l \\'1'<'11 any wc·i~ht in I and any \\'t•ight in }·'.
FITTING Ui'CONSTHAINED ~IATBICES TO UI.THA~IETIW' THEES \'1.\ STATIS
TIC X (FUUT[X]) [X E {F,./·;}]
Instance: Sl't. s· oft/ t.axa; Hl'lllinll'trk /) E .\/11: alld a pw,iti\'1' illt.c•,~?,c'l' H.
Question: Dew:; tlll't'l'l'XiHt. anultnlllll'll'il' trc•c• (i E (111 sul'h t.hat X( /),Irf'(l l )) S
/3'!
FITTING UNCONSTHAINEI> MATHICES TO Do~JINANT lJJ.TIIt\1\H:TIW' TIIEES
VIA STATISTIC X (FUUT[X,2:]) [X E {F,./·~}]
Im:!.ance: Sl'l 8 of t1 taxa; s<•minwtric /) E M11 ; and a po:;it.i\'c' iut«',l.!;c'r /1.
Question: OoPs thc•n• l'Xist. an ult.ranwt.r·ic t.n•l' (I E 11 .. sul'h thai. X(/), Ir11 (1 1)) :.:, 13 aJHI11'p(ll)) 2: /J'?
FITTING UNCONSTHAIN ED ~v!ATUICES TO DISCH ETIZEI> A IH>ITI\'E TU 1-:I·:S \'lA
STATISTIC: X ( FUDT[X]) [X E { F,. 1·~. Jt'}]
Instance: S<>t ,C.,' of 11 taxa; sc•minll't.ric /) E i\/11 : a111l a. posit.ivc• iut.c•,f.!,C'I' IJ.
Question: Dol'H t.lll'n' Pxist. <Ill <uldit.i\'1' t.n·c· '/' E A;~ :·mch t.hilt. X(/), 1T' ,,('/')) :.:: B'!
FITTING UNCONSTHAINED MATUICES TO GIIAI'II-B,\SEI> DOMINANT ADDI
TIVE TREES (FUGT[2:])
Instance: Compll't.<• gt·aph (,' = ( V, E), I VI = n; :;c•minll't.ric· /) E /\/,. rlc·liuc•d o11
all pairs of wrt.ices in (,'; :·wt. of taxa ,'-.' ~ V; aucl a pw;it.ivc· illt.c•,f.!,c·r· II.
Question: Is t.hc>r<' a. suhtn•<• T of(,' that. iuclud<•s ,',' Hlll'h t.lwt. L{,.,y}O' O(.r·, !J) ::::
13 and (1r A('/')]8 2: IJ,.;'!
Tab It• 15: Distance matrix fitt iu,c; d<'l'i:·dou prohii'IIIS ( ad11 pf.(~d fro111 [ l>11yX:~, J\ M Xfi,
Day87, l'ri8H]).
!j(j
Lemma 6 A polynomirzl-liuu: 11011.dr:lf'l'minislir r·mnJHtlalion is grwranl n ·d /o find
all .o;o[uliou.'i }~ f.o rw in.r;lrwn I of a rli.r;lrwcc walri.r filling problun X .o;urh IIHLl
lop; w,.,wJaf!( I,}~) :::; p(lli), for !Will.(' polynomialp.
Proof: Ohsl'rVI' that all of tlwsC' proble•ms have• solutio11s in which the numher
of c•lc·r•u•ul.s i11 t.h" solutiou is polyuomial in tlw shw of the• iuslatiC<', i.e•. ult ra-
llll'l.rics llf sizP I·'W (FIJIJT[X], FtllJT[X,~]). t.re•c•s with at. most ~181- I wrtin•s
(FIII>'J'[XJ), t.n•c•s wit.h at 111osl. V•'l vr•rticc•s (FliC:T[;::::]). To complc•te tlw proof,
ol•sc•rvc· that. tiiC' costs of solut.ious }·' to i11st.m•n•s I of distance• matrix fi tting
prohll'rlls whosC' st.Ht.ist.ics HI"C' hasC'd 011 /.1 and L2 arc• 0(181( 1Vmrwlif!( I,}·'))) and
( J( 1·4..,'1:.!( w,,,IJ'rlif ,( /, )")) ), I'('SJH'cf.iwly. I
Corollary 7 A polyuomial-limr· nomlrlrnnini.<;/ir rompulalion is guara11lrrd lo
jif/(1 ull ,o.;o/uliou .... } .. fo rw ius/anr·r I of a dislntu·c mnll'it filling problem X ,<;Itch
llwl /1.\' p·) < 0(21'tl111), fm· .o;mnr polynomial p.
Eal'h dist.atJC'C' matrix fit.t.ing prohll'lll defined ahov<' has solutious satisfying t his
hound i.e•. ult.ramt•l.rifs with off-diagonal c•utrics = Dmax (Fll UT[XJ, FU UT[X,2:]},
any t.rc•t• l'Otltaininp; S surh t.hat. PVPry e•dge has \Wight Dmax (FUDT[XJ), any
\'a lid solution t.rc•c• ( FIJUT[~]). As ~olution size• is proportional to solution cost,
all opt i111al solut iutts satisfy t.hc• hound in Corollary 7.
Cot·ollary 8 A polynomial-liml' nollrlclcrmiui . .,fic compulalio11 ;,., guamntccd to
./i111l 111/ OJ1Iimal .... o[u/ion.o; nf auy ill . .;;fnncr of a di.o;fallcc malr·ir filling problem.
!ii
FITTING Bt~ARY 1\IATHI<'Es To UtTHA~n~TIW' Tut-:Es nF H1m:11T 2 \'lA STATISTIC X (FBUT2[X)) [X E { F1• F.?}]
Instance: Set. S of u I ax a; :·wminJt•t. ric /) E /371 : and a posit in• i nl t'!!;Pt' II.
Question: Does th(_•rc· exist. 1111 ult.rnmd.l'ic· lrt't' (! E 1'11 ,1 such t.hat. X( D, 1ru( (!)) < 13'?
FITTING BINARY 11ATHICES TO DomN.\NT Ul:l'IL\~1ETIIH' TUEES OF
HEtGIIT 2 VIA STATISTIC X (FBUT2[X.;?:j) [X E {f1• /·~}]
Instance: S<•t. S of 11 tnxa: S<'lllinu•t.r·ic /) E /J,; and a pusit.i\'t' iult•gpr /1.
Question: OoPs t.ht~n• <-xist. 1111 ultram<'t.ri .. t.rc•c• { f E {/11 ;~ such tha t. X(D, 7rrr(U)) < 13 and 1r11(/f) > /J'!
Table !G: Auxilimy d<•rision proltlc•ms for NP-hard1wss proofs of tlisl.at~c·c· lllid.rix fitting derision pmhl<•1ns (adapt.c•tl from [1\i\l~W. Day~7. 1\riX~]).
llencc, all dislann• matrix fit.ling prohlc•ms ddinl'li al,ow arc· iu Nl'.
ProhiPm F ll \IT[ FJ] was shown l.o lw N Jl-hanl via a n•durt.iou from F B \IT~ I fo',}.
A rcduftion which PHtahlislws lllill. FBUT~[ /•'.} i~ N P-harcl is p;iVPll i11 Tal liP
I i . This r(_•dudion is adapt('d fmm a Tminp; n•tlul'l.iou from X:H! l.u SO 1.-M IN-
l~B lJT2[fd giV('II in [1\M~(ij. All iust.atl('(' of xac has (l solut.ioll if aud ouly if t. lw
graph (,'created hy t.his redud,iou ltas a VC'I't.(•x-pmtitioll into :~-VC'I'I.c•x t. l'iHIIJ!;I'"s
(KM86, Lemma oJ. It ran be showu t.ltat t.lu· ultrattwt.ric of ll' ;uirwd C'o:-;1. for t.lw
reduced iustaur<• of FBIIT[F2 ] will always ltaw sudr :1 partition if awl IJIII ,Y if
t!Jcre is all exact COV<.!I' for t}w origiua.J iusf.a11n• of X:~Cj IIIOI'C'OVc•r·, t. lw llliiXillllllll
nonovcrlappi ug (not uc·n!ssarily c•xart) wv,~r for t.lw orip;i nal i nst.;u.,-, · of X:J(:
can be easily derived frorll t.ltis ultranH't.rir. Au oplimalln•(• for auy iust.aurr· of
FBII'J'~[Fd will always havP off-diagonal <•ntri<~s E { 1, 2} [l\M86. Lemma 3]. For
such all Ill t. rnnwtric tree f! = {{ {{ .o;.}, ... , {·"lSI}}, 0}, { {I 11 ... , lr}, 1 } , { { 8}, 2}},
)Pt ;,, = 11,,1 aud j,, = l{{i,j} E lrlrl;,j = 1}1, I~ p:::; 7'. By [KM8fj, L<>mmet ·IJ,
r
E L: lt!;,J - II + L: E L: ld;.j -~I 1'=1 {i,j}C/1, l~r'<r"~r iEr' iEt•11
(5)
r
+ l{{i,j}ld;,j=l}I-Eip r=l
No subpart.ition iu tlw st'Cond partition of an ultrametric (!that is minimalundl'l'
J.'1 ('iiJI group t.op;Pf.ht•r VPrt.in•s from diffl'l't'nt subgraphs G; and Gj, as the tree in
which tlu·sl' v1•rt.in•s ar(' groliJH'(I sPparately hy suhgraph would have lower cost by
t•quat.ion !j aiJOW. llt'IH'<', Pach suhpartition in the second partition must be based
011 Vl'rt.ict•s from tlw sa11wsuhgraph G0 • Note that. F1(/J,U) is minimal when the
suhpart.itiot1s in t.IH' S('l'otul partition spt•cify a partition of G (and thus individual
( ,·,.) into t.IH' larp.;c•st. possihll' compll't.<• suhgraphs. The reaclt•r can verify that the
opt.imal part.it.iun of any suhgraph Gn into c.ompletP suhgraphs under F; is either
iut.o t.ht• fom t.riaugll's
(6)
{ ·~'n ,:1, !In ,:l,l' l/" ,3,'l}' { ,1' o ,l,:l, !/o ,'l,3• Yo ,3,:J}
or t.ht' tIt J'l'(' t l'iaugll's,
.i!l
plus siuglt• \'C'rtin•s drawn from tiw s<•t {.1".1 , 1 • • 1'11 ; 1 •• 1' , ,,:~}. <I<'IH'Il<lill~ oil witt•\ hc•t·
the single vcrtict•s in this st'l haw or ltm·t• not ht•t•n partitiotll'cl into(,',,. 1.<'1
thes<' two set.s of Go bt> dPtll>tt•d hy (,'.\1 and(,',: noll' that 1(,'.\11 + IU11I = ICI.
As singl(_•-vt.•rtt>x gwupings do not. alfttd t.ht• cost. of l r 1111dl'r 1"1• t lw cost of a
minimalult.ranwtrir { l is
I'
F1(D,U) - 0 + I{ {i,j} I d;,j =I }I 2: j,, 11=1
- ll::'l a( .qo,\/1 + :JI( ,',d) (H)
- lEI :l(I(,'MI + :ll('l)
A snbgraph Go is part.itimwd int.o four t.riangl<'s ir and ouly if t.IIC' cot'l'c'spollclillJ.!.
:1-sct. is i11 the maximalllolloVPrlapping •·ove-r of t.lw origillid im;l.atll'<' of X:U ~ i.P.
the clenwuts {.r.0 ,t,.r0 ,:z,.t0 ,a} art• part.it.iot!l'd into t.ltat. (,',.. llt•nc·c•, t.lw ori).!;iual
inst.anet.' of xac ha.'i au <~xact. COV<'I'if mrd only if'(,'"" = ,, i.l'. f.'r( /),II) = 11~'1--
:J(q +:JIG'!), completing tlw proor. As FBtiT:l[/•'t] is c·ompul.at.iunitlly t'<Juivalc·ul.
to FBUT[Ft], the rorr<•spoucling prohl<•m wit.h no n•st.rid.iorrs 011 t.lu•lll'i).!;ht. of tlw
ultrametric (1\M86, Lemma 2], awl its FBIIT[F1] is a rc•st.ric-t.ion of Fllti'I'IJ"d,
these problems arc also NP-Itard, a11rl thus N P-c·otnpld.c·.
Problem FUliT[F,~] can f>p shown NP-ron1pl<'l.t• i11 t1 si111ilar fHsllion. By il
variant of [KMH6, Lemma :J], the optimal tm·s for auy i rtsi.Httr(' of FB IJ'I':l[ /·~] will
always have ofr-diagoual cutries E {I,~}. As ltL -TJI = It! - 1f wliC'II d, p E {I, 'l},
F1(D,rru(U)} = .f2(D,rrlf(U)) for any/) E /3" aurl {/ E (/,.,~. Thus FBl J T~(/·~]
and FBUT2[F2] are arithmetically equival(~!lt.. As FBliT'l]/•i] and FBUT(/·i]
no
,/' "·' !Jn,l,'l .'lo,2,1
Fi,!!;lll'l' fi: Stnwtmc• of s11hgraph C:o usc•d in reductiotl from X:lC to FBUT2[Ft] (Fi,!!;•m· 2 rrom IJ\MHH]).
an· also f'ontplltationa.lly t'(ptivalt•nt hy a variant on (I\M86, Lemma 2], and as
FBliT! /•i] is a n•strict.ion of FlJUT[/;;], these prohlc•ms arc NP-complete as well.
Prohh•tn FUtJT[F,,;::] was originally shown to he NP-hard via a reduction
frotn a. n·st.rict.<'d wrsion of VEHTEX PAHTITION INTO TRIANGLES (Kri88].
TlH' NP-hm·cltwss of FBliT2[F., ~] can also hl' t•stablish<'d hy a reduction from
X:W nualogous to that. gi\'l'll allO\'<' for FBUT(Pt], whose proof is more iutuitivc
hc•ratlst• dominmtn• forn•s tht' partition of individt~al Go into complete subgraphs.
The• lnttc•r n•cludion will hl' usc•d in later sections of this thesis. As [I\M86, Lcm-
mas ~ and :1] can ht• modilic·d to work for tht• com.•sponding dominant ultrametric
prohlc•ms. t.hc• rt•llscming above• by which F'UUT[XJ, X E {F1, F2} was shown NP-
romplt•lt' al:m shows t.hat FllllT[X.;::]. X E { F1• /;;}, arc NP-complete.
61
X.1C $~, FBUT!:![FJ} (ndaplrd fmm {A'M8r;j)
11 = :l(q + :JICI) 8 = X u { .'/n,JJ,')' I 0' E { 1, 2, ... , IC I}, !1' )' E { I'~. a}} D = [d;,1]
where D is tldlned n•lat.iw t.o a. graph(,' = (,..,·, /~) wnlpost•cl
of the union of t.ll<' graphs G..., = (\<" 8n ), I ~ n $ I( ·'I· Each subgraph Gn rom'SJ)()Ilds to an t•lt•Jilt'nt off,, C (.',
Co= {;r.o,l!;rn,·l,:rn,:l}, .l'n,{l,~,:l} E .\',and has t.IIC' st.rut'I.III'C' shown in Figme 6. Oiveu (;, tl<•riu<' /) as
ll;,j = 0 if i = j,
d;,i = I if {i,j} ~:: E,
cli,j = 2 otiU'rwisl'.
B = \EI- 3(q + :JICI)
FBUT2{ X}$~, Fl!D'I'{ X) (X E {F., I·~}) {JJayX7}
n' = n + c.p,
where c.p = t/J'ln-1 ruul t/' = I .!'in - I.
S' = S + Yi, 1 $ i $ c.p
D' = [di,j} = [ ~' ~~ l, where M = [mi,j]· mi,j = .,;, for all I $ i. $ n nud I $ j :S c.p, M' is the trauspose of M, and 1 is a S<JIIill'l! matrix with zeros on, b11t ones off, the main cliaguual.
B' = B
Table 17: Reductions for (lislarw<! matrix fitt.iug dt!dsiutl prolJit•JIIS.
(j~
TIJ(' n•dudiou giveu iu [DayH7] which establishes that FUDT(X], X E {Fl! F2}
is NP-hard is giw~u in Table 17. Day requires that lSI be an even integer 2: 4
iu his wrliiort of FBUT2(X], whk!t ran I)(' ensured by replicating some Ci E C
iu tlw giw-11 iustatll'l! of X:JC. An optimal disrrctizcd additive tree T for there
dtll'l'd installCP of FIIDT[X] c.au IH' t.ransfonned in polynomial time into a tree
rousistinp; of two suhtr<•c>s, an ult.mmet.l'ic. tree U of height. 2 on .5' attached by
au c>dp;c· of lc•up;t.h (),!) to a snht.n•c• rooted at. vertex v that is attach<·d to all vcr
t.in•s .'/i, I ~ i < <.p hy Pclges of length 0.5, such that X( D, rru( U)) = X( D, 7r A (T))
[ l>ayH7, l'roposi t.ion :t]; morPovcr, an opti malul t ramctric for the original instance
of FIH IT~[XJ ca11 lw similarly transformed into an optimal solution for the re
dtu·c·d inst.aun• of FlTDT[X] (Day8i, Proposition 5]. Hence, FUDT(X] is N?
romplt•f.c•, and hy t.he arithmetic <'quival<>ncc of the F1 and F slatistics, F'UDT[F]
is N P-com pi <'f.<' as W<•ll.
A n•dufl.iou which t•stahlishes that FUGT[2:] is NP-completc is given in Table
ll'L This n·duct.iou is has(•d ou tit<' f'('durtion from VERTEX COVER to UBQCS
ilttd liBCCS giwu in SPct.ion :J.:U. F'or graph Gin an instance of FUGT[~]
rn•at.(•d hy this rc•durtion, dPfine a. calloTiicnllr'fC as a subtree T of G that contains
,..,. aud is c·umpmwd of edgc•s of t.lw types {•,ui} and {vi, ej},cj = {vi,x} E Eve·
Lemma 9 F:t•r·,·y ;,,.~tam·r of Fl!GT(~J crcalrd by the rcductio11 i11 Table 18 has
tl m i u i m al-lr 11 !If h ltu I hat i,'l ra" on ira/.
P1·oof: Lt't T ht• a minimal-length l.rt'(' of length 13 for a rt'duced instance I of
(i:3
••• ... ., - ... ..,. .... ,... • ..,,.. 1/V ....,,_b•.;o.~ ··- . .... ~'""' .... _.....,...,...._,_.,.. _ _ ,..,. ._,_~- ........... ~ ... .. ,_ .. .-... - ...... ~ .. rJ<VV.,._A-.._...,...,.,.;:Jiof"l~~-'11•..,...---....,-', ......... "...,..._.......,.
VC ::;:;L FUG1[?.}
V = {*} U {Vi II::; i ~ 1\·\ .. cl} U{ f'J ll :=;j ~ l/~'\'(•1} D = [d;,j],
d(*,l';) =I d(*,Cj) =2
where• d(r,;,l'j) =·I d(v;,cj) =I if fj = {u;,.r} E /~\'(', d( v;, Cj) = :l ot.lwrwi:w d(c;,Cj) =2
8 = { *} U { ci II ~ j ~ I Ev(d} 8=1\+IEvcl
Table 18: Reductions for distance matrix fittiug d<'cision prohll'tlls ( nJitt.'d ft·om Table 17).
FUGT[~]. If T is not a cauouical tn'<', it contains oru• or lltol'l' •·dp;••s of t.l11' l.ypc•s
T' from Thy replacing each non-cauoniral edge X as follows:
that e; = { Vk, z} E Evr.:. Tlu~ fomwr rase! c.ar111ol. ocelli' lw•·attsP it would
create a tree with a length I:J' < I:J; tlw lat.t.<~l' cas<~ prorlttn•s a l.l'l'l ! '/'' of
. equal length.
2. X = { v;, Vj }: Assume without. los:-; of gmtPralit.y t.ltal. t.lll'l'l ! is aln•ady all
edge {*,Vi} or { *• Vj} in 1', and replaee X by tlw c!clg•! of this pair t.lw.t is
not in T. This cannot occur bec:ause it. would neat.(! a l.m• wit.l1 a lt·ugt.J.
(jtj
IJ' < 11.
:J. X = { r;i, r;;}: Assume without los:-; of gen<'rality that there is an edge { VJo ei}
in'/'. If there is a vertex Vt E 7' such th<u Cj = {vt,z} E Eve, replace X
by the edge {vt,c;}; else, replace X by the edges {*,ut} and {vt,ei} such
that. r;i = { v,, z} E Eve. The former case cannot occur because it \\ould
nc•a.t.<• a t.re<~ with a leugth 13' < 13; the latter case produces a tree T' of
<·qual l<!ngth.
iJ . X= {vi,f~.i} :mr.h lhal Vi is nola 1!crlcx ofci: Ifthere exists a vertex Vk
iu '/' such that Cj = { vk, z} E Eve, replace X by edge { Vk, ei} to T; else,
rc!plac<~ X hy the pair of edges { *, vk} and { vk, ei }. Neither case can occur
lwcauso c•ach would create a tree with length B' <B.
Tlu~ n<•ated t.r<'P T' has the same length as T and still connects all vertices in S;
mol'<'oWJ', as '!'' contains no non-canonical edges, it is a canonical tree. I
Cauonkal t.n'<'S have several useful properties. The path lengths of a canonical
f.r<'<' '/'an~ su<'h tha.t [11'tt(7')]s 2: Ds. Moreover, the vertices in the second level of
a canonkai tr<•<' 7' correspond to a satisfying vertex cover for the original instance.
Theorem 10 FUG'/1'?.} is NP-com.ple.lr..
Proof: By Corollary 8, FUC:T[2:] is in NP. Consider the reduction from VER
TEX COVEH to FUGT[2:] givrn in Table 18. This reduction is polynomial time.
~lort'0\'<'1', upt.imal solutions for an original instance of VERTEX COVER and
65
- - - - .............. --.-·--~···"'•""'-·-~--..,. ... " ... ' '--....... , _ , • <'" 1_.,. .. .., • • - .,---·-· ........................ ,.,~~....,_~-...,_,.......,._fiMI'.I' ... J
~-. ' · its reduced instance of FUGT(;:::] can he Cl'l'alt'd from l'ach ot.ht't'. If t.ht' ol'ip;inal
instance of VERTEX COVER has a satisfying Vt't'tt'X rovc'r \''" ~ \'\'r' nl' si~c·
/(' ~ 1\, construct the canonical tt'l'l~ liukiug thl' V<'rl.in•s of \' • wit.h t.hc' \'c•ttin•s
{ ej} and *i this tree has length /\" + I Ere I ~ IJ, ami is t.lttts a solution t.o t.hc•
reduced instance. If the rrduccd instaun· of FlTGT(~) has a solut.iou t.n•c• '/' of
length B ~ 1\ + lEvel, construct lh<' canonical t.n•n T' cort·c•spollclinp; l.o '/' of
length B' :::; B. The vertex cover v· dcfitH'd hy the vc•rl.icl's in 1.111' SI'I'Otllllc·vl'l or
T' has size JV"'I = B'- I Eve I $ B -!!~vel ::::: 1\ ,; thus, \l"' is a satisfying Vt'rl.c•x
cover for the origiual instance. I
A Turing reduction from SOL-MIN-FBUT(P.] t.o SOL-MIN-FBIJ'I'~!J·~) is
given in (Kri86, Theorem ~1; howcVt'r, unlikP t.lw red! · ;,ion frotu X:J( ~ l.o SOL-
MIN-FBUT2[F1] given in [KM86], it. is not obvious how to ronvc•rl. this Turing
reduction into a many-one reductiotl. Several prohh~ms that. invo\Vt' lil.tiup; sc•mi-
metl'ics to dominant and subdomiuant nltmmet.rics using st.at.isl.ic:s 1~ 1 , J~'J., aud
L00 are shown to be sol vahle in polyrwmial time i 11 [ K riHG, 1\ riHX]; r<•la.l.<'tl pro It-
lems involving other statistics arc examined in [Day!):lJ. 'I'Iu ~ rc ·dud.iuu rro111
U UW to FUGT[> J given in [Day83] doe~; not, work for t.lw same~ t·c ~asoiiH as l>ay ':;
reduction from UUW to WUOWL (see Section :t2.1 ); howeVt'r, l.lu! fomwr J·c•tlcw-
tion cannot be fixed by using the implicit-graph versiou or ( 11 1W tldilwtl a how
because the reduction uses an intermediate problem (CONSENSUS PH.OBLEM
IN CLASSIFICATION~ which re(pJircs that. the implicit graph lw ind11dc~d iu t.lw
(j(j
:J. 2.4 SumnHtry
Fi.l!,urt•s 7 and H show t.lu· various n•d1Wt.ions clc•scrilll'd ir• this sPrtion, and Table•
J !} .l!,ivc·s l.lw c·onc•sJu•wlt>llc·c· ht'I.W<'<'II prohh·ms in I hPsc· figun•s arul pruhlc•ms
dP~nilwd in 1111' litc•ntl Ill"«'. i\'olc• I lwt all IJllt. tc•u of I hc•sc• n•ductious an• c•itlH'r
''·" n·stril"l.iou or f,,\' ltrit.llltll'l.ir· c•qui,·alc•tJCP, All of llwsc• prohlcoms an• iutc·r
l'l'rllll'ii,J" hy virtw· of lwing NP-complt'l.<•; lrmV<'\'t•r, til<' pal.t<'rn of n·ducl.iorrs in
litis rlia.l!,l'illll wiiiiH• sip;uificaul. in lalt'l' sPcl.ions of this l.lu•sis. Not<• that t•arlr of
tlwsc• n·dud ious n•quirc• only I !rat a solution c•xist that has <t gh·c·n cost. not that
a solution Ira\'«' a cosl. aho\'c• or h«'lo\\' a gin·u limit: IIIOITO\'c•r. I hP proofs of c•;.dr
of llrc·sc• n·dwfions II ~:;, II' p;i\'<' algorithms for c·otJ\'Prting solutions of cost r to
i11slarwc•s of II into solutiurts of rost fl for rc•dun•d iustann•s of II' and vkc• \'c•r·sa.
sudt that tlll'sc• 1' and 1, ar(' n•latc·d arit.hnwtirally. ThP fornH'r propc•rty. along
witlr llw rc·dudiuus for CLIQIIE and VEHTEX ('OVEH givc•n ill [G.Jin. Sertion
:L 1]. c•st ahlislll's I hat ;til mrrc•sponclinp; gi\'<'11-cost phylogc•uetic i11fc•n•nn· dc•risimt
prohh•rns il rc• N 1'-complc•t.c•. Both of t.ht•sc• propc•rt ic•s will also hC' useful in later
~wdions of I his t.lu·sis.
Tlw dc•dsion prohlc·ms gi\'f•n in this sc•c·l.ion do not answer questions typically
askl'd hy sy~tc•mnt.ir hiolop;ists. and arc• thus .. ,>t n•le\'ant in themselves. However,
I he• N P-rompl<'t.c•ru•ss n•stdt ~ for t hPst• prohlems do suggest that fast i.e. polyno
mial timt• al~orithms do not t•xist. for tiH•se probl<'ms. and that efforts should be
()7
• ' . ' .. .. ' ' . • .. • • ~~ ~- . .. ¥ .. . , ¥ . . ..... . ... . ... . . . _
CLIQUE ~ H('(' I'
I:('('
t r BQ<' 11()(.
~ tlQ('
a - F I I J>'J'[F]
/ 1'(:11'1'[~]
~ , ~ ~ tJB{C,Q}CS, IJBW V~:.H I ~~X \r COVbH -..__, r
,., UB{C,Q}DI ~ l lB<:I'
""' /r ~ UB{C,Q}CI
Figure 7: Reductions amoug phylogmwtir. inf(~f"eJH'<! dP('.ision probl<'lliS. H<•rllwtions n $f,1 fl' arc deuotcd by arrows from II to II'. Arrows uwrk<·d by a and r· correspond to rcduct,ious by arit.hmdic: <:quivaleuce and r<!sl.rict.iou, fi!SJWdiw·ly.
I Problt•Jll
Tlll'sis Lit.l'I'Htllrt•
l'l1ylop,c•npt il' t!B{C.Q}CS {C,Q}CS (D.JS80] I' a rsi tnouy l!B{ C,Q} Do {C,Q}DO [D.JSH6]
IJB{C.Q}CI {C,Q}CI (D.JSHfi) l!BW SPQ [GF82, DayS:Jj lltr\V SPP (GF82, Day8:J] Wl TO\VL WTP [DayS:Jj
( ~haradPJ' B{Q.C}C B{ Q,C}C [DS8fi] ( ~umpat.ihility lf{Q.C}C U{Q,C}C [DS86J l>ist.anr·r· !\latrix FB UT[Ft] biJJCt [I\M86] Fit tin~ FBl1T2[Ft] 1'IIIC:d [1\M86J, Llft (KriS(i].
FUT[I] 1Day8i] F B t 1'1':! [ /·~] ~H [1\riSfi], FtTT[2] [Day87) Ftl t JT[Ft] ~~ t [1\ri86], 1//Ct [1\l\'186] FlllJT[F2] 6.2 t [I\ ri86J Ftrt!T[J.' >] "- P·l [Kf'i88] FtJDT[o], o E {F" F1} FAT[n], a E { l, 2} [Day87] FUGT[>) AET [DayS:J]
Tahh• I!): ( ~orn•spotul«•twt• 1-!'l.wc<'ll phylogenetic infNt'IICc problems in this thesis and pi'Ohh•IIIS i11 t.lw litr•ratm·t•. All solution problems are marked with daggers ( t ): ;dl ot.ht•r prohlt•ms M<' dt•cision problems.
69
•" -..~ ' ~- ~--. .. ·~ · · · · "•"''' ••• ~ .. .,_.,.. -·u ••• , .. ~ ......... --.: .,...,.~,...._..,..
WIH'X \\'tt< ·x
WBQX Wt tOX
PBCX t rt l('X \\'t lQX
lTBQX (I POX
~!IIIQX Figurt• 8: Hestri rt ion rt•d uct. ions among p hy lop;1•nt'l. ic parsimony dc•c· is ion pwlt· lt>ms. Tlu•sp prohlt>ms an• st.at1•d !'l'htt.i\'!' t.o a phylo_gl'tlt'tir parsimony nitPriou X. NoLl• that t•ach prohlc•m aho\'1' is Hlso linkt•rl hy rc•st.t'il'l.ion rc·cludiolls to l'adt of its four corn•spotHiing rPtirulat.c• prohlc•ms (sc•c• Tahle• 1-l).
focused 011 t.ill' design of polynomial-tinw appt·oximation al,gmitl11ns whirlt p,ltill'·
anke solutions that. ar1• rlmw t.o optimal (DayH:J, GFH:l]. Th<'SI' n•dudiuns n111
also he used to ddenni nc• t.he C'Otll pul.at.ional hard nc•ss of mort' nu 11 ple•x prohlc•rns
(Section 'l) and to plaw limits on 1.1H~ kinds of polynornial-t.ittll' approximations
that can exist for pllylog<~llt'l.ir. iuferetH'I' prohl<·ms (Sed.ion G).
70
4 The Computational Complexity of Phyloge
netic Inference Functions
Ill t.lw last sc·c·tiou, n•rl.ai 11 dc·dsion proi,IC"ms assoriatc•d with \'a rio us phylogc•ut"tir
iurc•rc·rwc• nil«•ria wc·n· slrowu t.o be• NP-compldc•. By a folklor<" !'<'SHit iu tll<'o
rf'l.ical c·ornpul.c•r ~wic•r~e·c·, c•ach of t.IH• rorTc•spor~eliug solutiou prohl<"ms is solvc•d
l1,\' a furwtiorr in F tnVI' [C:.J7!), Chapt.c•r !i}. llowewr, this says littl<• about the
hardrwss of IIIOI'I' wrupi<'X prohlc•ms lmsc•d 011 tht•sc• crit.<•ria. In this S('ction. I will
dPf'i\'c' Vrtl'iOIIS bounds Oil f.ht• COillplc•xif.i<'S of th<• ('\'aluation, SO)utioll, :o;pa11ning!
<'IIIIIIIC'r·;tl.iorr, awl rmulom-gc•ru•rat.ion fuurtions ilssoriated with the opt.irnill-rost,
,f.!;i\'l'u-wst., arrd givc•u-lirnit. wrsions of the phylogc·rl<'tic inf<'l'ellet' problems.
The• n•aclc•r should l'!'lltelllher that rt•sults given below for phylogenetic parsi
IIIOn,\' and distance• matrix fit.tiug given-cost and given-limit problems apply only
to t.hos<' prohlc•ms in which tire cost-parameter ~: satisfies the restrictions given
111 L<'lllllla :J awl Corollary 7.
It will ofl.c•n lw convc•uient hdow to have a single binary encoded representa
t.iou ( t·arwuiml rqn·n•cnlafion) of each solution to a problem, so that individual
solutions arc· not output more than onct>. Such representations exist for all prob
lt•ms t'Xamirlt'd in t.his thl'sis:
• ('/ramdc'l' Compnlibilify: Reprt>scnt characters by character-state adjacency
mat.rin•s whosl' charart.er-stat<•s are in instance input order, and represent
71
• • • · • • - • • • •••• ·~·, ... _ ,_ ..... , ........ ..,. ., ..,..,.,.., ___ , ..... " ,.. .. .... .... ,..,..,... ..... _ .rn.._..,_ ..
sl't:-; of chamctt•rs by such matric·Ps in instann• input orclc•r.
• Phylogn1rlir l'nr ... imony: Ht•pn•st•ttl charadt•rs as al.u\'t'. mull'l'Jli'I'St'lll t I'C't'l'
hy \'t•rh·x-<ulj;u·t•ucy mat rin•s whost' \'t•rt in•s an• in lc•xicop,raphic· orclt•r n ·l
at ivt> to dwractc•r-stalt• iust.atlt't' input orclt•r. Ht•l kulat iotts art• slurP.! in a
!-iC'parat(' list. hy :-;omn'·\'t•rtt•x st'l. lt•xicop;raphic ot·clc•r n•lati\'<' to dlill'ilt'lc•r-
stat<• instann• ordt•r.
• Di ... lancr Malri.r Fillin!J: Ht•pn'St'llt ultrarut'l.ric:-; l.y tlll'ir mrrt•spor11liup; ul -
tranwtrir mat.rin·:; whost' \'t•ttin·s an· in input. iust all<'<' orclt•r. lkpn·sPIII.
athlitiw trl't'S hy t.lu·ir iuonlc•r l.ra.\'t'rsal St'<JIIt'tlt't'S [St.aTHO, St•c·t.ioll al, wlll'l'l'
tht• tn·e is rool.t•d at t.ht• lt•ast. \'t'rl.c•x in iupul. insl.allt't ' ordc·r a11d t.IIC' ldt ..
right ordt•ring of suht.n•t•s is n·plar('d hy an ordt·rinp; o11 t.lt~ • "asis of t.ltt•lt·;•:;l.
Vt'l'l<'X j II a Sll ht I'C't'.
ThPst• ranonira.l rt~J>ri'S('JII.ations ran IH• t•nt·odc•cl and clt•nHI<·rl in polynotni;d l.illlt'
using slightly modifit•d stand<ml algorithms [St.aTHO, St•c:t.ion a]. i\11 'I'M solving
problems <'Xamiued in this st•dion will IH~ assiiiiH'tl t.o OJH!I'ill.<~ on ami to output.
canonical l't!prcsentations.
4.1 Fanction Complexity Classes
This section will give a brief overview of some fuuctiou da.ssc!s t.lmt. will IH! IISI'fi
below. T hese classes fall mainly into two regions ·- wit.hiu FJ'NI' aucl wit.hiu
7'2
FI'SI'J\CE(poly). Till' rdat.ious lwt\wr•tJ all dass<•s <l<•fitwrl in t.llis section an•
sllowll iu FigurPs !) aud 10.
1.1. 1 Clnsses Within F fJN 1'
Tlwn• an• two )Jil'l'ill'l'llir•s of iutt•n•st. witllitl ppNl':
~. Tlw Opt.PIJ(11 )] hi<•rarchy, wlu•re f is smooth and f E O(poly) [GKR9~ ,
1\rc•HH]: For polyuolllial-tinH' NTM N E FN l~q, let O]llN (.r) be tile opti-
nwl valuP (ln.rg<•sl. for a m<tximizat.ion prohh·m, smnll('st for a minimization
prol,lt•rn) n>lllpul.<·rl hy N for input ;1'.
Definition 11 (adapted from [KreSS], p. 493) A funrtion f : E· -t
Z i.-: in OplfJ (oJJiimizalion polyuomiallimr) if lhcrr. i." a polynomial-lime
NTM N E FN /~? such thai. f(;r) = optN (.1:) for all.r. E E·. We say thai f
is i11 Op//)[::(11)} iff E OplP a11d /he lcnglh of f( :c} in biuary is bounded
by :; ( l·r I) fol' all ;l' E E•.
'l'lllillgh all Opi.P[f(n)] fuuct.ious arc contained in F pNP(_((n)J, each function
i 11 F JJN /'[J(u)J lll<'trically l'<'d uccs to some Opt P[f( n )] function [I< t'e88, Thc
Ort'lll :J.:!(i)]. Thus, all OptP[l(n)]-romplcte functions are also ppNP[f(n)]_
romp I 1'1 l'.
~ .•• , .. . ..... . ~ . ...... .. ..... ... ~ .. .... . · - .. . . . .. . ... .. ~ ....... -..1!-" ......... ~ ......... -~·-·' ., _ _, ........ _ ...... - - ..... , .,_ ,.,._,....,..
Not<' that nwt ric n·dul'! ions can st r<'tch input hy a polynomial arnuunt.
Out> consl'qnc•un• of this slrl'tcltiup; is thnt a fuur1io11 that is wmpldt• fur
OptP[f(u)j is also compldt• for Optl'[.f(11°( 1l)] ((:1\H!l:.!, p. iJ. It j., 111ost
uoticeahlt' iu t.lu• tlaJJll's for rt•rt.ain classt•s dt'litwd usiup; hi,l!;-0 not at ioat t•.p;.
Opt. P [ 0( log log 11 )j is 111on• propt•rly \\'l'i t It'll as 0 ptl'[t- log: lu~ 11 + 0( I ) J.
Th<' followiug class n•l<•t ious art• known:
• For PVt'l')' smooth fuuct.iou J,
- For j(11) $ ~ logu, FfJNI'[f(u)-l) C /t'J'N/'(J("llunlt·ss I'= NP [1\rc•XX,
Tht'ort•m '1.:>.].
- For /(n) $(I- t)logu, c E (0, I], Ff'N/'[f(u)-l) C FfJN/'[fl,llunlt•ss
P =--: NP [Hei88, Tlll'or<•m :ll J.
For /(u) E O(logu), ppNI'(J(n)-t] C FJJNI'If(,)J uukss ~!; = II!;
[ABCWI, ThPorem '12].
1 ppNP(O(Ingn)) C FfJNl' lllll<'SS P = NP [J\rt•XH, Tltt•orc•flt'l.l] .
., ppNl'(O(Iogn)] C FP1f~' uuless R = NP [Sd!JI, Tltc•orc•nt l:l] aud Fc·wl' =I'
[Scl91, Corollary 4(i)].
• FP1fP C ppNP ifaud ouly if pN/J(O(IugH)] C J'NI' [Sd!JI, Tlt1~orPIII 1].
Other separations hold under more exotic ass 11m ptious [ Bc!iHH, Bd!JI]. H.I!St!a.rdt
to date has focused on all classes below FfJNI'{O{Iuy,nJI aud t.IH! class FJJNI' .
'J'II(JIIJ!,h rnauy n•:mlts haw IH'eJJ i111portcd directly from .language-based to function
lmsPd rlassr•s, t.IH'I'i~ haw· IH·«·n sonw rwtable surprises, in particular the non
r•quival«·rwc· uf Fl1
1f 1' aJJd FPN/'[O(Jugn)) and tlw separation of ppNP[O(log n)),
l-'11lf1', illlll FI'N''.
Mr•trk n·dudbilit.y suffirl's to show harduC'ss for most siugle-valued fuuction
l'lassl's. Fund. ions ar·c• shown F /1
1f1'-hard via the property of lmddabilily [CT!H ,
<:asHf;j. Hl'ntll that all prohll'ms an• hasl'd on relations R: I x 8 011 iustauces
I a11d solul.ioJJs ,C,'. LPt. SOL-II(:r) lw the s<'t of solutions for au instance .r of a
prohh·m II,
Definition 12 ([CT91], Definition 4.2) A problem 11 i.'l paddahiP if there is
11 pail' of pol,tJIWIIIitLI-IiiiH' funrlious lt.1 : '2 1 --+ I and h2 : 21 X ,C..' --+ 2"' Sllrh /hal
for· all jir~ilr ,<;f'!,o; {.r1, .r'l, ... , .rm} E '21 and all siuglc-valucd funclious f thai solJJC
II. if .r = lr.((.r,.r:.!, ... , .r"')) lhru h2((.r,,.z·2, ... ,:rm),f(;r.)) = (Yr.Y2, ... ,ym),
,,,,,.,.,. y, E .WJI..-ll(.r·;) for I S iS rn.
Pacldilhilit.y was dPiiH<'d implicitly in [Gas86]. Gasarch realized that if instances
of a paddahh· prohh•m n ('(\II f'JICode an X -hard problem then n is F P{- hard.
If X = NP, paddahle probll'ms are ppNf'[O{logn})_hard [Gas86, Theorem 8] (the
I'HSI'IItial idt•a is that a. hiuary O(log 11 )-depth NP query tree contains at most a
polynomial numher of NP queries). Chen and Toda defined and named paddabil
it.y indt'IH'Ildt•nt. of Gasarch's work, and stated their results in terms of F P1fP
Imrdm•:-;s.
75
/ .. . , .- . . - ... ~~ , ..... .. .... ~ ... ~- · ·•- . ... .... .. ~· ........ ~ .. ···4~· ··• , ,, ,,, ... ·- ·· .. ., .... , _,.._,,,.. .. ,_,,..."',...,.. . ..,..,.,.......') ~t·.,,_...c....,• -.·~r.-.. ....... ,.. ,_ .... .,.~.,._,. ....
Theorem 13 ((CT91), Lemma 4.1) Lr/11 bt' a paddaMr lll'o/Jirm ll'ho.-;t' as.o;o.
cialcd decision problem Lu i.-; NP-Iwl'd. '/'hn1 ll1 i.-; F/'11''1'-hard.
Proof: (sketch): Dcfint• function Qu(.r·,, .r··z, . . . , .1' 111 ) =
NP-hard, any function in F/'1f 1' l'CIIl ht• :wl\'<'d usin)!; a sinp;lt• call to ({II; ho\\'t'\'t•r,
as n is paddablt•, any instann• of Qu f'illl IH' soh•t•d IIHillp; II siup;k call to any
solution function for n. I
Their interpretation is the mon• powc•rful iu light. of St'lma.n 's l't'slllts showiup; that.
F'P1j~' is int<.•rmediatc in hardtH'SH hPI.W('t'll i"J'N/'(O(InKu)J mul FJ'NI' (s«'t' ahovt•).
The following variant of paddahilit.y will I)(' mwfnl hdow:
Definition 14 ;1 problr.m. n is paddahh· with rt'SJH't't. t.o a. problc•((l II' if 1/nr·t·
. . 1 l . l I . J I . I " 1 I I l I . J I I .... •I .;I zs a patr o po ynom.za- .1/1/. r: rtw: .wu.o; '·• : ~ --+ tlllt l·z : ~ X •"' -• ~·
such tha/. for all finit e sd.<> {:r.'1,:l:1'l,···, ;':'m} E '2 11 and a/l .o;intJlr-mllunl fuudiou ,.,
Theorem 15 If problem II is paddablr: wil.h. n :spt:d /.o tlll N 1'-/uml pmblt·m II'
then fl 1 is F ljf 1' -hard.
Proof: The proofs by wbich tbe analogous result bolds fur padrla.hlc! pro!Jir!IIJS
([Gas86, Theorem 8]; Tbcorem 13 above) requin! ouly that CJ11 lu! llil .. 'it!d 011 soiiiC!
NP-hard problem, which ucecl nut he /-'II· I
7()
Theorem 16 If proiJlr:w II 1//,r:/rimlly 1'((11lC(; ,<; to pi'Oblcm fl 1 and problem n i!i
wuldniJ/r: with n :.'iJu·r·t to a problem II", then problem II' is padrlc'•lr: with rr.spr.cl
lo II".
Proof: By dPfinition, t.hen~ c~xist polynomial-time functions h1 and h2 such
that for .r = { .1:" ... , :r. 711 } E 'l.f, y = {!Ill ... , y11J E 25, and any single-valued
funrtiou f that. solvPs II, (y) = h2((:l:),J(h1((:r)))), and functions Th T2 , snch
!.hat. for c•vNy siup;hvahH'd funct.ion g that. solves fl 1, there exists a function j
t.lwt :-mlves II such that. .f(:r.) = 72(;r.,g(1'1(x))). Define functions h't :21 --./9 as
h' 1({.r}) = '/'t(h.t((.r))) a.IICI h'2 : 21 x 8.q-+ 2s as h'2((:r.) , y) =
h'J((.r), '/~(Tt(ht((.r))),y)) surh that (!I)= h'2((.r),g(h'1((a:}))). Functions h't and
h''l an• polyuomial t.inH· and show t.hat. fl' is paddable with respect ton". I
Not.<• tlwt. pnddahilit.y as dditl<'d h<'n' is distinct from paddability as traditionally
dPiitH'cl in romp11t.ational compl<'xity tlwory ((RDG88, pp. i4-75J; [BDG90, pp.
122 I '2:1]).
Four class<•s of nmltivahu·d functions will also be used below:
• NPJ\IV = FNP and NPM\Ig == FNP9 [Scl9l].
• N f> M V o F pN 1' is the set or all partial, multi valued functions that are
COiliJHlt(•d by polynomial-time NTM transducers that are allowed to ask up
to a. polynomial number of adaptive NP queries before nondeterminism is
in\'Oked.
• NPMVq o ppNP is the sd, of all fnlldions f E NI'M\' o 1-'1'·'11' snrh that.
the nondeterministic phas<' of t.IH• rompulat.ion is rcst.ridt•d toN I'M\;,.
• (NPJ\1\1 o ppNI')9 is t.ht> set of <t\1 functions .f E NJ>M\" o F/'Nl' :-;uch
that. graph(!) E P.
The NPMV-composition dass notation is adapl.t>d from t.hat. in [FIIOS!I~J. V;diant.
noticed that all solution prohlc•ms assoria.t.c•d wi t.h N P dc•cision prohh•nls a l't' i 11
NPMV9 ([Sel91, p. •I); [Valiu)). Class NI'A!\~1 o FI'NI' is usdul lll'causc· it.
corresponds to those solution prohlc•ms whost' associa.l.c•d dc•cisiou and <'V11.luat.io11
problems arc in NP and OptP, respectivc•ly. 'l'hc• followinp; f'lass rc•la.t.ious an•
known:
Lemma 17 ([Sel91], Proposition 7) Iff E F/'N/'IO(Iu14n)) and !/l'fiJJh{f) E I'
then f E: P.
Proof: Implicit iu the proof of [l<reH8, Tlwon~111 t1, I j. I
Corollary 18 The following hold:
1. (NPMVoFPN~').'1=NPMVy.
2. NPMV9 ~ NPMV, NPMVy o I•,PN1' ~ NI'MV o /•'/'N 1', N/'M~, C
NPMVy o ppNP, and NPMV ~ NPMVo l"f'N1' .
.9. N PM V ~c F pN I' [.'ielrJ 11 fJ. 1 0}.
78
4- Ff'N1'1°11"11."ll C N PMV if and only if NP = co~NP {.S'e/91, Theorem 4,
/'art lil} .
. '"}. F f'N /'(O\ln,u)J C N J> M Vy if aud ouly if P = N P {.S'c/.91, Theorem 3, Parl
27}.
fi. N flM~1 ~r F/'N/'(O(Iogn)) if and only if P = NP {Scl.91, Theorem 3, Parl
W}.
7. F/'NI' =c N P MV,, 0 ppNI',
H. NPMV,, o ppNI' c NPMV if and only if NP = co~NP.
Proof:
Proof of (I): The left wards inclusion is trivial. The right wards inclusion
follows l1y this simulation: for any machine M corresponding to a function f
i 11 ( N I' 1\1\1 o F pN 1')!/1 nondctcrministically guess all possible ·1equences of NP
qtu•ry answPr:-~, compute nondetcrministically relative to these queries, and accept
a rompllt.(•cl out.put if it is is valid (which can be checked in polynomial time, as
gmph(.f) E P).
Proof of (.'1): Follows from the prefix-srarch technique (see Section 4.3).
I'I'Oof of ( 4 ): Th<• ldtwards implication follows from the collapse of the Poly
tlomial II it•nu·chy. Til~· right wards implication follows because the characterisLic
fut~rt.iotl for any language in co-NP is in F pNP[tJ.
i9
_. .... "'>'IIC_ff ... "fii'J"t_..,..._,..,...,,....,._.,. ... ..., •.• , . _,, ... ,,,.,...,,. ....... "''l'"•·-,.....,....~"'""""""~1'~--..,..-.... ""N" .... 'IIlHO, .... ~~;._~
Proof of (5}: The lcftwards implication follows from lltt• rollapli<' of l.lw Poly-
nomial Hierarchy. The rightwards implication follows !'rom Lt>mma I i.
Proof of (6): Similm to proof fot· (!i).
Proof of (7}: Follows from definitions itnd pr •mf for (:l).
Proof of (8}: Follows from parts ('I} and {i). I
The major relations that are still OJH'Il an• N I'M \';1 ~ •• /•' lj\'11' [Sp)!)J, p. '2:1J,
NPMV ~ NPMV" o ppNP, and NI'MVo FfJNI' C NfJf\J\~1 o Ff'N1'.
Note that any optimal-cost. solution prohl1~111 ran IU' simtda.t.<·d hy askiltA t.lll'
number of N P queries rcqui red to dderm i 11nthe opt.i mal cost. { St'l' S1•cl.iou ·1.1) mul
then using this cost as the input to the <'orrc•sponding giwu-cost. prohll'lll. ll<·ru·l·,
if enough computational power is availa.hle, auy fuudiou inN fJMV, o FJ'NI' rart
be reduced to a function in N PM V,., i.e. t.hn given-cost. solution prohl!!lll. This
simulation will be used in many of the proofs giveu hdow.
4.1.2 Classes Within FPSPACE(poly): Counting Clnsses
The classes of interest within FPSPACE(poly) hdoug to tim!!! 11i1~rarr!lil's of
C(lll!lting classes, which are based ou two diffcreut 111odes of r:ouutiug s•·,HI.ious.
For a polynomial-time NTM tramHJuo·.•· N E F'M P II, ld #N (:r.) lw t.lu! IIIIIIIIH'r
of accepting paths i.e. the total ttumbcr of solutions, aud Spill IN (;1:) lw t.lu~ 11111111wr
of different solutions computed by N 011 input :r..
80
pp,NI' II
/ F pN P[O(l()gll)]
/
Nf'l\!Vo Ff>NI'
/ ' c· N Jl M \·;
1 o J.' /'•\' 1'
c/f /
Figure 9: Fu:1ction cla:-.;;es within F f>N 1' ( adapi.Pd fr<HII Fi,!!;lll'<' I of IS«'I!II ]). Inclusion relations arc denotc•d hy lltllllark<!d arrows ami ndiJJI'IIIt!lll. wlat.iotJs by arrows marked with c. Certain rdationships f.!tat. a!'<! not. markc·cl an· possilol«'; see main text.
81
I. 'J'Iw #-llic~rarc:hy, #I'll. whmw hll levd, #(FE~), k ~ I, is th<' class of
furwt.ious .f(.r.) suc:h tht~t. f(.r) = #"'(.t) for sonw N E F'£.~. This hi<'rarcby
is <'qtJiwtlc•rJI. to tlmt dc•fined ill [Val79h] oil classes irr PI! instead of FPH.
~. Tlw Spau-IIi<'r<ll'«'hy, SpauPII [1\STS!J], whose kth level, Span( F:St), 1.: ~ I,
is tlw l'!m;s of fund.ions f(.7:) such t.llat f(:t:) == Spa11N (:r.) for some N E FEr.
:L Tlw 1'1111 rt.io11 bonne h•d # P qu<'ry h ierar•hy, F p#l'[f(n)).
LC'I. t.lw first. awl :wcollrllc~v<'ls of #PIJ (SpanPII) he written #P and #NP (SpanP
and SpanN P), and defi liP. llardlless of functiou s in the classes of these hierarchies
rc•lat.ivc• to nwt.ric: J'(•dudbilit.y. Th<' followillg class relations are known:
Corollary 19 '1'/H· followiu_q hold:
I. #I'll, ,','pan/'11. and pp#~' ~ FPSPACE(poly).
::!. F/'11 ~ F /'#/'[I] ['I'W.fJ:J, Thrm·rm ,5, 1} .
."I. If l'ilhrT #P ~ F/'11 or FPII ~ #P then PI/ collapsc.c; l.o a finite lcvrl
{'/' W.9::J. eoi'Ollni'JJ 5. 7, Pnd I}.
.f. #I'll~ Pf'#l'[l] {TW.fJ:J, Throl'cm 4· 1}.
,r, , For J.· ?.: I, #( F~D ~ Spall{ F'f:.rJ [KSTH.9, Generalization of Pmposilion
.f. 7}
fi. for~· 2: I. Span( F~t) £:: #(F'f:.r+t) {1\'81'8.9, Gcncrali: ation of Proposition
.f .• '"').
Q•) l)-
1. For· ~~ ;::: 1. #( P'i.D = Span( F'i.n if and only if l : ~r = ~r f/,·s'/',"'.'1.
Gcnrraliznlinu of Thrm·cm 4..CJ}.
8. Fork ?.: I. Span(P'i.n = #(F~r+,) U' and on/11 if ~r = 11r f/,·,,,.-/'S.'J.
Gcnr.rolizalio11 of Thrornn 4.1 1}.
9. #PI/= SpauP/1.
Proof:
Proof of (I): Any polynomial-t.inw NTt\·1 <HT<•pt.or or lransclun•r ran l11• silll
ulated in PSPACE [BDG88, Tlwon•m ~.X(h )]; h<'IH'<', hy n•sr•rvinp; spil•·•· fur
constant 1.~ + I such sinmlat.ions, any P'::.r conlput.ation can l11• siiJllliat.<·d in
FPSPACE. A countt>r of accepting pat.l1s ra11 II<' al.t.adu•d l.o any such sinlldn
tion to cakulat<' #N(:r); to calrulatP SpanN(.r·), count o11ly tlio:w ;wn•ptin,u; paths
whose output. valu<'s have not. IH'<'Il <•rwollnl.l·n·d hdorc· in till' sillllllll.t.iorr i.r·. r•·s
imulate N up to the current (U'f(•pt.ing path. As tiH· output. or any rulll't.ioll
in these hierarchies is polynomially horuul<•tl in t.lw l<•up;t.h of tlu• inp111., t.lu•s••
hierarchies arc in FPSPACE(poly).
Pmof of (5 - 8): The proof for (G) is a :-~tmight.forwanl uwdirintl.ion of !.hat. in
[KST89]. Lemma ,1.:1 in [KST8H] ran hP n•st.at<•d in I.<'I'IIIS of ora.dc•s A, A' E ~r if
tine tl in lhe algorithm on page :H>7 is deld<•tl, aut! t.lu• nuulit.iou "or (3i(q., z,) rj
Br' is added to the dC'fiuiliou or ORACLE Oil pag<~ am~. llsing t.lris 1<'111111<1, it. is
easy to prove gcncralizccl versions of P ropo:-~ition t1J> anrl Corollary 1.fi i 11 IJ< STH!J],
from which (6), (7), and (8) foltow. As}:~= 11~; impliPs t.lrat. #PII = #( VLt+r),
8:~
tlw lr.ft.w<trds portio11 of (H) impli(•s t.h<• stronger rt•stdt that Span(F~~) = #PII
ll\oii!J~].
!'roof of (.'J}: Follows from (!i) and ((i). I
Tlu· COIIIJtiug fuuctions of intc·n·st in this I h<'sis (IJ'(' Clll in dass Spall ( N p 1\1 vq 0
F /'NI' ), wld('h is i11 the low <'11<1 of SpauPII. Til<' following class n•lations an•
klloWII:
Corollary 20 1'/n followin,q holrl:
I. S7uwl' ~ Span(NPMVo ppN1').
:1. Span(N I'MV o Ff'N1') ~ #NP .
.f. Span(N I'M\~1 o ppNI') ~ SprwP if aud o11ly if NP = co-NP.
Proof:
l'roofo; of (I · !:1}: By d<'rinition. As SpanP = Span(NPi\fV), relation SpanP
~ Span{N/'M\~1 o FJ>N1') is OJH'Il iu part hl•cause relation NPMV ~ NPMVgo
l·'f'NI' is opt•n (s<'<' St•cl.ion '1.1.1).
!'roof of (:J): ( 'onsidPI' a NOTM i\1 which computes a function f E N PM \1 o
F/''\'1', and ll't. MNI'.\1\ ' he t.IH' marhim• iu N Pi\1\1 invoked in the second phase
of t.ht• conlpllt.at.iou of M. D<'fine the following oradt• on input .r and output y
for ,\/.v 1'.\1 \ ·:
~ .. ... .. ~ . · ~ l --. ...... .,. . ._~,·-- ~ ... ... . . ••4••' - ....... . - ......... ~. ~ ......... ..
A(.r,y) = {Th<'rt' is a romputatiotl path of .\ls1•.\f\ ' on input .1· that prudllt't'S
output. .11}.
Orarll' A is in N f'. Considc•r till' ~OTl\1 ,\.41 whirh rotuputc•s a fu11diuu !I iu
F~~: M' guc>sses an output !J of M.v/'.\1\ ' • pc•rfor111s the' initi;d FJ'NI' phnsc· of
th<' computation of Jl/, formulat.t·s input . . r to MNI'.\1\'• and usc•s a siu).!;ll• call to
oracle A to set' if MN/'1\1\ ' on input. .1· outputs y. If the• atts\\'t'l' to oracle• A is
"yes", 1\1' outputs y; dse, M' r<'jPrt.s. Ea('h distinct. out.put. of M is prculnn·d ll,\'
JH' exactly otiC('; hence, Span(M) = #(i\1').
Proof of (4}: The proof of th<' l'ip;ht.wanls part. is a \'ariaut. of that. for t.lll'
right wards part of [I\STS9, Thc•on•m ,1.1 I]. Lt't. 1- I u• a lilllp;llil).!;t' i 11 N 1'. 1>1'1 iII«'
machiue M iu N PM \1.? o P fJN 1' which asks a sinp;l<' cpl<'st.ioll t.o t.lw oradP i11 N P
for membership in L, and out.put.s '' I" 011 a.ll c·olltput.flt.iuu pat.IJs if I.IJc• orcwiP
rejects i.e. input .r ¢ L, and otherwise' has no an·ppt.inp; COIIIJIIII.al.iou. Ld . .f .....::
Span{M); note that .f(:r.) > 0 if aud only if ;r ¢ L. llowPvt•r, hy hypot.lwsis, .f is
also the Spall function of some marhilu~ i11 N fJM V. /\s t.ltis urarltiuc• I'OIIIIHIIc•s
co-L, L E N P and NP == co-NP. To prov<' t.h(! lc.ft.wards part., uoh• thai. if NP :::::
co-NP, then SpanP = #NP hy Part (H) of Corollary I!J abov(•; t.IH' waut.t•d f'<'sttlt.
then follows from (1), (~), aucl (:l).
I
Note that by the results of Corollary I!J, ('V<!IJ though #P is c·orrtaitwd i11 Spard',
the two are of equal romputat.ioual harchwss, i.(', (~V( !ry fuudiou iJ, Spall I'
t F p#l'
~ I I I I I I
F fJ#I'f'lJ
t p p#l'(J]
~~ FIJI/ #PI/ OOO!"E:---~.,._ 871auPJI
~ ~ ~ I I I I I I
r l I I I I
FIJ'::.l' #N I' S'pa11 N P
t t~t /r' I) ---~ Sprm P
Fip;lli'P 10: Funl't.ion dassPs wit!Jill FPSPACE(poly): couutiug classes.
mdrintlly n•dt~n•s to a fuuction in #P; this parallels tlw relationship between
Opt.l'[.f(rl)] and P pNl'{f(1I)J.
4.2 Evaluation Functions
1\hu·h of t.IH· t•arlr work on l'\'aluation problems focused on decision problems
thai approximatt• t•valuation problems; sPP (Wagl\87, Wagl\88, \Vagl\90] for a
n·vi1·w of I his work. Two approaches to directly determining the complexity
86
of t'\'aluat inn prohiPIIlS iu\'ol\'t' usin~ pacltlahility aucl till' Opt P llic•rarc·h." (st•c•
St•ction ·l .I I). \ 1 ppt•r hmnHis tlll prohlt•m romplt•xit,r wit hiu I ht• fuurt ion ilouuclc•d
i':P <flll'I'Y hit·rarchy Hr<' Pasil." t•stahlil-illt'tlul'iill,l!; tht> Opt!' hit'l'itl'l'il.\' .. \l'i llliiii,\ '
OIIt' n•ductions oftt•n COt'l't'spotHl to nwtric n•tlut'lious. tlu• l'iillllt' also hultls for
compll'l.t•tu•ss rt•sult.s: irtdt•c•tl. (;asarrh, 1\rt•n\t'l. a11d Happop11rt I< :I\ H!l'2. p. lj
ccmjc•rturt• that. Opt.P[f(u)] -cotll[li<'fl'llt'SS is till' llol'lllallll'lta\'ior of t'\'aluatiull
prohlc•ms rotTt'SJHHHliug to NP-cumph•tc•dt•c·isiou proltlc•tns. llo\\'t'\'t'l', Jlillltlal,ilit~·
is still us<.ful ill t.host• rast•s wht'll t Itt• trausformal io11 fro111 lllilll~' - otlt' to 1111'1 ril'
n·d uct ion is uot ohv ious.
ConsidPr uppt•r bounds 011 I lw contplc•xit.y of t.lw t'\'alu;ll ion prohlt•IJIS for I Ill'
phylogPnt'tk inft•rt•nn• nit.<•ria t'Xilllliw·d iu this tht•sis . By ( 'orollaric·s I. .r,, awl
8, a II rita ntcl.t•r CO Ill pal. i hi I i t.y prohl<'rtls and 1111 \\'c•ip,h l.f'd ph,Y lop,t •nt•ti(' p;u·si lnuny
ami distann• matrix fitt.iug prohlc·ms havt• optimal c·osl.s that. itl'l' poi,YIIOIIIially
hotnul<•cl, and that all W<'igllt.<•<l prohlt•ms haVt' opt.i111al <'osl.s t.ltaf. art• <'Xpotll'll ·
tially holln(kd.
Corollary 21 All duzrarlfr r.ompnlibilily nud llflltwi,q/tln/ phylo!l' ndir· Jlflf',o;imouy
and dislancr malrir. jillin,q nmlualiou pmblnu.'> r·.ramhml in lhi ... /J, .... ;.'i ,,., in
Opti'{O(!og n)}. All weighted phylogf'tlf1if' Jmr .... imony fl11fl di.•dtlllf'(' Ulftlri.r jillin!J
evaluation ptoblun.r; r.uunin.ffl in 1/d.o; lhu;is tl1'f' iu Opl/1•
By dcfiuit.iou, weighted problems in which t.lu~ maguit.H<!t~ of till' lar~J;••st. WPi.t!,ht.
is polynomially bounded are also in Opt.P(O(Iog n)]; hell<:~! , 11'1. '' •tuw•~i.t!,ltl.c•d"
~7
abo r••ff'r to sud1 probll•llls. By n·s11lt.s from [1\n•XX] ritt•d abov<•. OJIP CCIII n•nrl
"Opt.l'[f(u))" as ''F/''V/'!flu))" i11 th1• r<'IJl(tilld<'f of this st•C'Iiou.
( ~-msiriN uuw ~·ornplf't.l•rwss n•sult.s, sl.ilrtiug with tlu' 1111\\'<'ightt!ll prol,l<•ms.
\IAX -CLIQI IE ;uul ~~~~-VEHTEX COVEH an· hoth Opt.P[O(Iogu)]-compl<>l<'
([1\n·XX, 'J'IU'ol'l'lll ~.~J: [C:I\H!J~. Tlwon•ru :J.:J]). D1•filw 1\IAX-X:JC itS tiH' sizt• of
till' Jai').!,I'SI 11•111-o\'t•rJ;lppill,l.!,. r;lf.Jwr I !Jan I'XHf't. f'0\'1'1' hy it sUIISI'I. of t.IU' p;i\'1'11 ;J.
sl'ls. As X:H' is a ~"rwralization :m~l [<:.Ji!), p. !i:Jj, ~JAX-X:JC is n gl'rH'ralizat.ion
of ~tAX - :il>l\1: as 1\IAX-:H>~I is OptP[O(Iogn))-compl<'lt• [C:I\H!J~. Tlr<•or<·m :vi),
so is i\IAX-X:W. Till' n•ductions from t.h<'S<' prohh•ms l.o ch<mtct.er compat.ihility
aud IIIIWPip;ht.•·d pl1ylopprll'l k parsimo11y <lllCI dist a nee matrix fit.tin~ prohl<'ms
v,ivt>rl i11 St·•·tiuw; :J:.!.l. :t~.:!. aud :t:!.:! givt• arit.hnwtic n•lat.ions hPt W<'<'ll tltl'
rusts of optimal solution~:. aud tlms yit•ltl t.ht• following nwtric r<'clurt.ious:
• 1\IIN-VEHTEX COVEH(x) = J\11N-X(x) - lEI
(X E llBCCS. llBQCS, llB\V,llBC:t•)
• ~IIN - VEHTEX ('OVEH(x) = ~IIN-X(x) -(:JIVI +lEI)
(X E (I BCI>o, lJ BQDo, lJCCI. liQCI)
• ~lAX-< 'LiQtJE(x) = ~IAX-BCC(x)
• ~IAX-IH'C(x) = ~tAX-HQC(x)
• ~IAX-X:W(x) = ((IEI-i\liN-FBlTT2[F',](x))/:J)- :liCI
• ~IIi\ - F B t 'T:![ /·'1]( x) = ~liN· Ft 1 DT[F1J(x)
0 0 1 ..... -.. --·~·--· ....... . . ...... . ..... ~ ~ .. -.-~ . ... V <o. .. ......... . ,. ...... . ""'-r
• ~~~~-FBUT~[/·~. ~j(x) = ~lli\-FBPT~[/·\. 2](x)
• ~tli\-VEHTEX COVEH(x) == ~liN-Fil<:T[~](x) -l/~'1
Theorem 22 All f'lwmdt ,. t ·omJwfihility tllllllltllt't i_qltlt d phylo.f/Uitlir· Jltl/'8iiiiiiii.'J
n11d tli.'i/anrr-m(l/l'ir jillin,q r·t•tdltllfion pmblt "'·" t.rami11ttl in thi.~ ll~t.o;i.~ ,,.,
o,,, P{O( log 11 )j-mm pld t.
Cousidt•r complt•I.Pil<'SS n·sult.s for t.lw Wt·i~lil.t•d prohlt'I!IS. Ordi11arily, a
,, ... ightl'd c•valuatiou pro"l•·m is showu Optl'-•·umpll't.l' hy " \';trianl. ul' t.l~t • r•··
ductious Usl'd to sl10w t.IH'ir umv<·ightt•d \'Prsious l.o lu• Opt.I'[O( lo,l!,/1 Jl -•·outpll'l.•·
[C:KH!}~, p. H]. llowt'VC'I', thP n•quin·d modilinll.ious itl't' !Jot obvious for t•i ·
ther· phylog<'IIPlir parsinwuy or disl.illlf'(' lllitl.rix rit.till,l!, prohlt•IJJS. For I'XH111pl•·.
VERTEX COVEH as t.lw problt•IIJ that assori;tl.t•s \VI'i,l!,bl.s wit.h t.lw vt•rtit·t•s of a
graph anrJ rd. II rllS lJJ«' Slllll of t.JIC' WPigftl.s of t.J.c• Ill ill j IIIII III Wt ·iF,JJI. VC'fl.t•X I 'IIVI'I'.
This prohlem is OptP-compiPlt• [C:I\ IHn, Tlwon•JrJ :J.:J]; ltmw·vl'l' , !.Ill' tliHinJII.y
with modifying tlJ(' mauy-ouP n•dul'l.ious t.u phylo~•·u('t.j,. 1wrsirrumy prol,lt•IIJS
H!J
,1.!,1\'t'IJ in ~~•·• · t ion :L:l.l is that opt i111al :;olut.iorrs to tiiC' n·dur('d inst etllC£' twit her
l'flllsist•·tJI ly rrrilliJniz<· nor 111axiuriz<· tIll' W<"ights oft lw \'t•rt in•s of tht• rcuulidatt•
t'o\'t'l', !.111. iusl.•·•·d JnirrillliZI' tlw Wl'iglrt. of t.lw wholt• tn•1• (st•t• Figurt• I I). This
,·orrrpli•·al.t•s I Ill' •·xl.raf'l.ioll oft II(' t'Psl. of t.lu• uspfu) portions of t}w solution from
t lw mst. of t.ht- wholl' solut iou. This difficul ty nm hP n•soh·t·d in t lw sanw wny
as for w•·i,L!,ht"cl :\lli\-STEINEH THEE IN (;HAPHS (CI\BB2, Tlwon•m :\..!]. hy
irwludinJ!, in tlw inslil!ll't' an t·xplici< w<'ight.iug functiou for all <'dg<'s in t.ht• im
pli•·it. J!.l'ilph. This vt•rsioll of t•ttrlr wt•iglrt.t•tl phylopprH'lir parsimony prohlt•m is
Opt 1'-r·ollrpkl•·: lroWI'VI'r. it. violalt·s t.ll<' spirit. of tlw original hiological proh!C'm,
and t lrus will not. II(' rousidt·r••d h('rt• furtlwr. Similar diffkulti<•s orrm in attempts
to uwrlify tIll' lllitll,Y·OJI(' n•durl io11s for dist.atl!'t' matrix fitt.iug prohh•ms give11 in
S•·ct iou :L:!.:J.
By I ht• n·st.ril't.iotl n·dul'l.ious from all 1111\\'C'ighted to weighted pltylog(~JI£'lic
par:;iJIIolly aud •lisl.ii!H't• matrix rit.ting prohi('IIIS, t.ht• e\'aluat ion prohl£'ms cotTC'·
spuudiup; to t.lw lat.tw ar<' Opt.P[O(Iog 11 )]-hard. llowev<•r, it is possihl£' to do
lu•ll 1'1' usiup; paddahilit.y:
Theorem 23 'l'ht· Jollol.'iu,q hold:
I . . \IIS-H'HCCS a11d MIN-Wil(JCS arc paddablr with respect to \1ERTEX
('O\ ·1m.
:!. MIN- H'/1('/Jo and ,\1/N- JI'IUJ!Jo arc pnddablr wilh 1'csp rcl lo VEN.TEX
( ·o' ·mi.
!)Q
(a) (:I) ( I ) ( I ) A B ,, B
'V v c ('
( I ) (:q
~H H~
AC BC AC BC AC BC AC BC AC BC AC IW
~ I ~I \a I :l ~I ~ :1 ~ a ~ :t \'It +a + I A B c A c " H c " c \3 I :J ~I \J I I \1 /1 r~ \I I:\
0 0 () () () (J
~ I vc =:!I vc =6 vc = '1 vc =:1 vc = ·\
c = 8 C =H C=H I c = !jl C = H 7
(a) (I,)
Figure 11: Difficulties with t.he redudiou from wt•igiJI.c•d MIN-VEHTEX COVEH to weighted phylogenetic. parsimony ( ~valuation pwhlc·ms. ( :raphs arf' showr1 oil top, and all possible trees for each graph tllllll'f t.lw rt•tludiou iu Tat.l" !) arrd the costs of these trees (C) and tlwir corr('sporlflillg VNI.(•x mwrs (VC) art • S-!,iVf'll below. The numhcrs in pa.reutheses i11 tlu! gmphs dt•llof.c• t.llf' wc•ight.s <•sstwi;ll.c •d
with particular vertices. Note that. for t.lu~ graph in (a) , I.IU' llliuirnal t.n•c· in l.lw reduced instance also yields a miuimal wri.(~X mver, which is uot. t.llf' t'Wif' for l.lw graph in (h).
!)I
:1. MIN- Wll W is ru~r!tlrLMr. with ,.,.·"TJrr·llo VE/fl'EX COVBR.
,f. MIN- WJJr:r:/ and 1\1/N- WIJ(J(:/ arr pnddnb/r. "'ilh rr.'>prrl lo VERTEX
r:ov1m .
. ~. MIN- W!Jf,'r i.~ fuulrlaMr wilh rr.'>prrllo VE/l'I'BX COl'ER.
Proof:
fJroof of (I}: Assttlll<' wit.hottl. loss of p;etwrality that all gtvcn instances
.r1, .r·.l, ... , .rk of VEHTEX COVEH haw~ tlu• same numhcr of vertices, and can
I !nt;; lw lltilpp<·d l1y t.h<• n•durt.ion f in Tahle 9 into k instances of URCCS, each
uf wltil'lt has d «"ltaral'l.<•rs and 111; taxa, I $ i $ k. Ld m.· == max m;. Construct
fl rt iust.mu·<· .r' of !\II N-WBCCS on d' = krl characters c1, ••• , cd' split into zones
z; == ,·1,_ 1 )•rl+ 1 •••• , c;.,f, I $ i ::S ~·. with C'arh zon<' corresponding to one of the
p;i\'1'11 inslatt<'<'s of l!BCCS. L<'l ·"'' = U~=t ,C..';, with <'ach ,<; E .5'; being mapped into
its appropriat.t· ZOII<' as iu J(.ri). with ZC'roPs in the characters of all other zones.
<iiw Pa!'\1 dlilrad.t>r in zon<' z; W<'ight. W:,(m•ti+ l)(i-1). Note that the maximum
W<'i~ht. in .r' has a tlltlllhl•r of hits polynomial in ~·, w•, and d. Hence, function
/q i~ polyuomial t.imc•.
No pal h in au optimal t.t'<'t' T for instance> .r' ran include a. vertex v such
that I' has dtaract.c•rs wit.lt stat.<.• I itt two diffen•nt. zones. Suppose that such a
path I' «'Xi:;t s, and asstmw ,,;it.ltout. loss of gt•n('rality that there are no vertices
fwnt ,..,., 011 this path. Dt•rtol<' t.h(• two zones by z' and ="· the first two vertices
sttrruttuclin~ I' on I' t.hnt. ha\·c• 1-stal.t•s totally within one zone by ;r and y, and
H2
the number of 1-stalPs in .r aJHl y hy IJ. and 111 • Suppo:-;t• .1· mul .'1 an• in t.lw
same zone; assume this ZOtlt' is ::'. Cn•att• a path p' by t.aldng t•ach \'t'l'lt•x in
p and retai 11 i ng only those edgt'S wholly i 11 zottt' ::' i.t•. pro jt•ct. pat. h I' unto t.lw
characters in zon<' ::'. Path p' still conrwrt.s .randy, and is shortc•r· t.han I'• which
contradicts the optimality of T. Alt.c•ruatiwly, snpposc• ,,. and .'1 art• in zottt•s ::'
and :;", respectively. Assunw without. loss of p;t•tu•ralit.y that. t.lwr·<' is a path frotll
;r to 0 in :;1• Any path from .r: to ymust. contain at lt•ast. IJ. + I 11 t•dgc•s and have·
length at least (! 3.)w:' + (1 11 )w:"· Considc•r t.rc•c• 7'' t.lmt. n•plan•s t.lw pat.h I' wit.h
a path from y to 0 of length ( 111 ) tllz" . Tn•c• 'I'' is short.c•r t. hilll l.rt•t• '/', wIt ich is a
contradiction. Hence, all edges in t.lw optimal l.t·c•t• ar·c· ht•l.wc•c•n wrt.ic·ps iu t.lu·ir
own zones, and the cost. of the optimal t.n~<~ for .1:1 cotTt'SI)(Jilds to t.hP sumnw•l
costs of an optimal tree for each zone tinws the W«'ip;lrt. for that. Wit<'. ltPI'all
from Section 3.2.1 that an \IIIW<~ight.ed hinmy Camiu-Soka\ t.n•t• on 111 t.axn awl
d characters has optimal length not greater limn mA; t.hus, t.hc• rosts of opl.inHd
trees for ea.cl. ~one cannot overflow into t.he c.ost.s for t.r<•es iu otll<'t' wru•s, a.rul
the cost of the tree corrcspondiug to a.ny :ri ear1 IH~ <~<J.o;i)y ext.rac:I.Pd froru t.lw f'osl.
for x'. Hence, function h2 is also polynomial time, Psl.ahlisltirrg p<uldaJ,ilit.y.
Proof of (2}: Given a set of instauces ;r 11 ;r 2 , ... , ;r.k of VEHTEX COVEH.,
constmct an instance x' as in ( 1) ahove with two adrlit.ions: ( l) t.IH!re an~ ( k + I)
zones, and the (k+ 1)-th zone of maximum weight is dPsiguat<~d z-t, arul (~) S' is
augmented by Yi, 1 ~ i ~ ( k + I )d, such that JJi has I 's iu posit.ious i f.o ( ~: + I )d
Cousid<·r au optimalt.r<·<· '/'for .1:'. All <•dg<•s { {!J.i, YU+I)}, I ~ j < (k + l)d} U
{ :'/fi:+IH~ 0}} an· iu 'I' l1y t.IU' n•asoning giv<·n in [D.JS8G, Theorem :1], and by
t~·asouirtp; si111ilar to that for (I) ahoV<', tlwr<• are no paths in 1' between vertices
ir1 dilf••r<•JJI. ;.wn<•s. MormVPr, t lrl'l'<' is no pat h p from any vertex u in a zone
::' l.o <tny !l.i· Snppos<• surh a J><tl.lr ]J <·xists; project p onto characters in ::', i.e.
I'J'<•a.f.<• a pat.h from u to 0. As !/j and 0 are already conncctf'd, this yields a tree
sltorl.t•r t.hmr '/', wltich is a contradiction. ll('liCe, the cost ofT is . de cost of edges
{{.tJ.,lU+d, I -5:) < (A·tl)d}U{!I(k+Jld,O}} plus l.iJCsurnmcdcor.tsofan optimal
11'<'<' for l'<tdr i':on<~ t.inws the W<'iglrt. of that. zone. By reasoning sirnila•· to that for
(I) a!Jo\'<'. funrl.ions h 1 mal h2 art' polynomial time, ('Stablishing paddability.
fll'oof" of (.'I · /j): Tlw proofs for (:J} and ('1) are variants of those for (1)
mul (~), rt'S!H'cl.i\'l'ly. As any or<lered phylogenetic. parsimony problem can be
silllulafl'd h.v au appropriat.C'\y-sl.ntct.tm•d iustancc.• of the Generalized parsimony
prohl,•tu, ( ,r;) ,·au lw prowd hy a variant on any of these other proofs. I
Corollary 24 :Ill wrf.qhlrd phylogrnrlir par,r;im.ony cvalunliou J1roblcms exam-
Simil<l!' n•sult.s hold for Sl'\'Pral of the distanc(' matrix fitting problems.
'rheorem 25 Thr fnllowiii!J hold:
! . .\1/,\'-Ff!l'T{/•'t} is paddablr with rrsprrl to X3C.
- ~·
l ·'
'l
j
,I ·l
1
2 .• l!IN-FUU7[F,, ?:.} it~ tJaddablc wilh rr.'{twd lo X;JC.
3. MIN-FUGT{?:.} i:; paddablc wilh rc.o;prc/ In \"/~'UTE.\' CO\'I~'U.
Proof:
Proof of ( 1}: Assume without. loss of p;t'Jlt•ralit.y t.hat. ill! j-!;1\'t'll insl.fltH't•s
;r,, x2 , ••• , .T.k of X:lC have th<• s<um• tllllllh<•r of Vt'l't.in•s. ancl <'<Ill t.hus ht• llliiJIJIC'd
hy the reduction fin Table 17 int.o k iust.aun·s of FBIJT[Ft]. t•ach of which lws
graphs G; with c; edges. Let c* = max c;. Cousl.rud. tilt' insl.iltl<'<' .r' ul' 1\-liN-
FUUT[F,] as a distance matrix D' hast•d Oil illl lllld<•rlying graph (,'' = u7:.:, (,';
l l t l' 0 . f . . 1' ( * I )k ( * I )lw- I) 'f { . . } /' I suc1 t1a c i,j = 1 1. = J, t i ,J = r: + - t · + 1 I,J E ~.,, illll
ll';.i = (c* + l)k otherwise. The maximum WPight. in .r1 ha:; a Jllllllht•r of hit:-;
polynomial in c* and k. Ht'nrc, function h 1 is polyuotnial t.iuw.
No partition in an optimalult.n•I/Wt.rir t.n•p '/' for insl.an'''' .1·1 can joiu VI'J'I,icr·s
from different component. gmphs. Assunw t.ha.l. two ~nwh vt•rt.in~s u atul " mr•
joined at level/. Consider 'I'' that. inst.<•ad joins a and ., :d. l<•v,•l (t·• +I }k. /\'!'.
d11 ,v = (c* + I )k, T' is of lower cost thau '/', which is a. coutradict.iotl. IJ,•Jic''',
all partitions must join vertices within iudividual (,'i· 1\ pm'f.it.iotJ of r:i into
either three or four triaugles i 11 t. he IIHtiiiiC'r d<~sni IH'd i 11 S<~f'l. iou :t ~.a is opt.i nwl
at level ( c* + I )k - ( c* + l )(i-1). MoreovN, thi!J'(! nut lw 110 ul.h1~1' joi ui 11g of
vertices in Gi until level ( c* + I )k. Suppose t.h<•n! was a partit.io11 of (/i at. lc!Vtd
l, (e* + 1 )k- ( e'" + l)(i-l) < l < ( c"' + I )k, wltic:h joill<'tl two fJJ'( ~Viously s1 ~paral1•
groups of vertices X and Y. Let. G x, Oy, a uri Gx U y IH! tlw su bgrap!Js of r:,
iudun·rl ],y tlu~ vert.c~x-sc~ts X, }"', ancl XU Y, respectively. L<'t. cr be the number
of c!dgps i 11 Ox and Oy, r:, be t!JC' number of edges in G xU l' less c11 , and Ct be
(lXI + lVI- I )(lXI + JYI)j~- (c1, + c"). Note that Cp is the lllllllhc>r of previously
usc!d c·dp;c·s, c:, is t.lw 11111111wr of llllm.;cd edges, and Ct is the n11mher of possible
f'dJ!;<'~. Figur<' 12 shows tlrat. t.he partition at. level l c.rm cxi:;t if and only if Ct $ C11 •
As ,·:1, is ~oi!IJH>Sf'd of co1nplete snhgraphs, e.P = IXJ(JXI- 1)/2 + IYI(JYI- 1)/2,
iliHI r:1 =-- IXIWI - c11 ; lumn~, this condition can he rewritten as IXIIYI/2 < Cu.
( :ottsich-r t.hl' followi11g t.lm•p r.ru;<'s for the simplest. possible partitions at level /:
of l.ltl's<' v<·rtin•s in Grr, c11 = 0. lienee, t.hc~ condition becomes ! < 0, which
is a. con1.ra.clicl.ion.
As I.IH•r·c• is at most Oil<' f'dgc joining X and Y in C:c., eu = 1. Hence, the
nllldit.ion lwmnws ~ < I, which is a contradiction.
• X llnrl F arc lria.ug/c.o; in either equation 6 or equation 7: As there are
at. most. t.wo c•dgc•s joining X and Y in G"' , Cu = 2. Hence, the condition
I C) ) I . I . I' . H•c·omc•s ~ < :.., w 11c 1 1s H contrcu 1dtou.
Using t.ht• arguttH'IIl ahow for the joining of two groups, the reader c.an verify
t.lrat. no joining of t.hn·<' or more' groups at level! can occur in an optimal tree.
Thl'rdort•, 110 part.it.ion at. h·vl'l/ ra11 <'xist. in a11 optimal trre.
9()
l
(c.* + I )k
(c* +I )(i-t)
---------r-------~------~---------
X y
Figure 12: Conditions for mult.iph· part.it.iou h·wls on subp;rapl: U.,. If l•·v•·l I is not used in the optimal t.r<'(', t.lw co~l. of p;roups X and }'" is qr(t/1 + tf..d; else, the cost is c1d1 + C11d2 • Thus, lc•vc•l I ntll c•xist. in an optimal l.t'l'c' only if Ctdl t Cud2 ~ Cu(dt + r/2) i.P. f:t ~ f',.
lienee, an optimalult.ranwtric. t.n•<• 'I'' for iust.atH'(' :r' will c·o11sist of l· llnllt.rivial
partitions at levels ( c* + I )1, 0 $ i $ (k- I) cotTI'SjmrtdinJ~; t.o sol11tiuus t.u l.lw l·
instances of X:JC, and will hav<· cost. <'<fiiHI l.o t.lw stllllltl<'d valtt~•s of /t'1 for l.lw:·w
solutions. Hecall from Sediou :J.2.:J that an optimal solution for ·''i in '/''will have•
in Gi that arc induced hy that solut.iou; titus, t.h<! msl.s of opt.ilnal ull.rarrwl.ril'
trees for each Xi camwt. ovcrnow iuto the cosf.s of opt.i111al ult.mrrtC'I.rit· t.rc·c·s for
other :r:j, and the cost of the solution for any .1:1 cau IH~ c•<t.sily c~xf.md<'d fro111 t.lu~
cost for :r'. Heucc, functiou h2 is polynmnial f.i nH~, c!sl.abl ish i 11,11; p;uldal,j lit.y.
Ptoof of {2}: A variant of tlmt given above~ for (I), rr•ad<• l<'ss t'OIIIJ'I'·x !,y
dominance.
~J7
/ 1roof of (:J): AsstJIIU! wit.ltout. loss of generality that all given instances
.1' 1, .1:~, ... , ;r,k of VEHTEX COVE H. have the same uumher of vertices v and PdgPs
(·, Constnwt tlw iust.all<"P :r' of MIN-FUGT[~] as V' = {*} U {U7=• \fr- {*}},
,'1'' = { ..-} U { U7= 1 ,C..'i - { :t}}, and IJ' such that all cl ist.ancm; lwtwccn pairs of vcr·
tin·s iu I.Jw salll<' \li an• tltos(' giwll in the reduction ill Table 18 multiplied hy
( 11 + r· + I )(i-l), alii I all clisl.anr<'s IH'I.W<'Cll pairs of vertices in different instances
""'' s11111s of t.IH' <·dg<•s 011 th<~ path hetw('{m those vertices in a canonical tree for
.r'. Till' l'l'illh·r can VPrify that in an optimal tree for :r', there will be no edges
IH•I.w<'<'ll Vl'l'f.i('Ps iu difren•nt \'i, as tlu•s<~ will he forbidden by the constraint of
dominatln', 11<'111'<', tlw cost. of t.h<' optimal tn•c will be the sum of the weights
of all <·d~<'s in optimal t.r<'<'s for <'ach \1;. Note that the sum of weights for each
\!;will It<' l<•ss tball (11 + c)(v + c + l)(i-t); henre, the costs of optimal trees for
<'ach .1·; l'illlllol. owrflow into tlu• < ··~sts for optimal trt•es for other ;rj, and the cost
of tlw solution for any .r; rail h(' easily PXtractcd from the cost for x'. Henc<',
fuurf.iou h1 is polyno111ial tinw, establishing paddahility. I
II. is unfortuual.t• that. most of t.he wrighkd dista.ncr matrix fitting evaluation
prohl<·ms do not yidd to paddahilit.y proofs of the style above. The exponential
ill<'l'<'ast• in I lw ll'ngt h of t.ht• wc•ights r('qnir('d to S<'pamte optimal solutions for
t'arh inst.;llll't' undt•r· l.ht• /•; statistic. complicates proofs for MIN-FUUT[f2) and
~IIN-FliU'l'[/•1. :;:::]. and it is nut olwiuus how orw could show paddability for
Flli>T(F.]. FtrDT(F]. or FlTDT[F2].
~)8
Corollary 26 The following hold:
I. MIN-FUUT[Ft}. J\I/N-FUU11F1, ?.}. all(/ ,\1/N-fo'l iUT{?.} urt· Fl'ji'''-lull'll.
!1. JHIN-FUDT{FI], Jl1/N-FUD11F}, M/N-F{Tlf'/11·1}. MIN-f{Tf)'/'f/"·d. nnd
M/N-Fl/UT{F2, ?_} Ul'f. propf'rfy ppNI'(O(Iov.u))_/1111'1/.
4.3 Solution Functions
Solution problems have been studied indin·d.ly via. tll!'ir approximation hy ciPI'i
sion problems [G.J79] and cva.luat.ion prohh~ms [G h: H!l2, K n•HH]. Mon• l'l'<'t>lltly,
these problems have been st.udic'd directly using paddahilit.y [CT!II, ( :asHfi] aud
multivalucd function classes such as N p M vq [SPI91]. Tlu· t.c•t'hlliquc•s di'VI'lopl'cl
in this lattct· work will be used in this s1~c.t.ion.
There a!'c several types of solution functions.
I. A function that complltf•s a single• solution [f:.J7!J, Claapt.N !ij.
2. A function that com pules hut cannot Pllllllll'ra.te all so\nl.ions i.P. 11 f1111t'l.iou
in NPMV {Sei!JI].
3. An index-driven funct.ioll !J(i, ;r.) that rompuf.c~s t.h<' i-1.11 solution ror· iu:..t.aflt'P
x uuder some polynomial-tinw orcle·ring /'em 1.-' llili',Y st. rings.
Definitions (l) and (2) will be clisf'IISSI'd iu this Sf~c:tion; dt-fiuit.iou (:~) tll's.:rilu•s
the enumeration fuur.tio11s iu Section ,U) aucl will lw discus~wd tlwrc•.
Cousid<•r functions of t.lte type iu ddluition (I) ahovc!. Following [CT91],
t.lw for.t1s will lu~ 011 hounds oJJ the mmplexit.y of SOL-X,, the class of singlc>
valtwd fundions t.ltal. r.omJ>tllP solutions to problem X. Certain properties arc
kuown l.o i111ply upp,~r ho1111ds orr SOL-X,: problems that have a polynomial
llii!IIIH'I' of ft~asibh• :mlut,ions for auy insl.itiJC(' are in FP,f1' [Scl91, Proposition
!i], nrul probl<•ms t.l~rtf. an~ polynomial-invertible in the scnsP. of [Wagl\87] i.e.
all solut.icms of C"osf. k ca.u lw tmtmwrat.l'd iu polynomial time, arc of complexity
•·quiwdPrtl. to t.lteir rust functions. Jlowc•vcr, none of the problems cxamirwd in
this t.lwsis PXltihit. <•it.IIC'r of these propcrt.irs. Consider instead lower bounds.
By t.lw prdix·st•a.rch f.<~chniqut•, wlric.h builds an optimal solution hit by bit by
consulting an N P solut.ion-pr<'fix oracle ([BDG88, p. 61 ]; [G.J79, Chapter 5]),
<'Vt•ry prohll'lll X has at. h•ast. unP member of SOL-X, in F pNP; hence, the lower
ho11111l ca.u lw no harder t.ha11 F' pNI', As no optimal-cost solution function can
IH' c•asi<•t· than i t.s associa.t.<•d rva.l uat.ion fund ion, lower bounds can be derived
from tlw co111piPxit.y of t.lw associated evaluation functions. Such bounds can be
improvc·cl for phylog<'IIPt.ic infrrcncc problems by applying Theorem 16.
Theoretu 27 All.ooinfllr-t,alurd fuuclions solvin,q all phylogc11clic injf1'cncc optimal
t'o.-;/ .~olrtlion f1robhm8 r;ramiurd in lhis lhr . .,is nrc F flfP -hard.
Proof: As not.('d in Section :3.2.4, all rc>ductions in Section :J.2 give algorithms
for I ransforming optimal solutions for original and reduced instances into one
anolllt'r. llc'IH't'. t.IH' nwt.rir n•ductions from MAX-X3C, MAX-CLIQUE, and
100
- •• • ' • "lh ..... ...... ~ ........ .. ,. . ... ... .. . ..... ,..... . .. .. .. ~ ""'_,. ,..., ~···· ' • • , . , ; .. .. .. ,. :- • -·-·.-..<It-
1\IIN-VERTEX COVER to tm\r<'ightcd phylo~t·ut'lir iuft•n•ttn· t·,·aluat iott prol.
l<•ms gin•n iu SPrtion ·1.2 cnu I><' modifit•d to gi\'t• nwt ric n•dud.ious ht'l \\'t't'll t hc-
corresponding optimal-cost. solution prohh·ms. SOL-l\1 A X-CLIQt IE is paddahh·
[CTHI, ThC'orcm ·1.2]. ancl SOL-1\IAX-X:JC and SOL-1\IIN-VEHTEX ('OVEU ran
lw shown paddahl<• \·ia funrtions that. simply colnhin<' I lw givt'll iusliiiiC'I'S iutu
one instance without. adding any II<'W conlpolll'llt.s. I
Consider now functions of t II<' t.ypt• i 11 dt•li 11 it. ion ( 2) <1 ho\'1'. t\ II phy I ow·nt'l. k
infC'J'Cnn.· optimal-rost solutiou prohlt•ms dl'lill<'d in this t.lwsi:-~ an• iu N IJ M \ ;, o
fpNP, aud all roJTc>spomling giv<'ll·cost. and giv<•n-limit. solution prohlPIIIs an• in
NPJHV9 • This cldinit.ion is mwful primal'ily for vhnwlb:iu~ t.hc• sl'l. of :;o)ut.iuus
asRociat.cd with JH\rt.icular inst.a.JH'!'S of a. prohl<'m, marl highliAiltinp; t.lw ('OIIIptll.il
tional structures for different. t.yJWS of solution funet.ions t>.g. tlal' two-pha:-;t• na.l.tii'P
of N P MVg oF pNI' romput.at.ionl'i (s<'<' S<·ct.ion ,t.l.l ). llowt'V<'I', it is also pussilll<'
to dc•riw l'c>sttlts using this ddiuit.iou, such as tilt' followiup; low<·r lulltll<l ort t.lw
complexity of singl<•-va.lued functions for t.ll<' phyloA<'Il('l.ir iuff'I'<'JII'I' p,iVI'II·I'Ust
and given-limit. solution problems.
Theorem 28 A II .'>in!Jlc-valuf'fl june/ iou.o; .<wltJiTI.tJ ttll ph y/o!J''"" ir· i uj,.,.,.,,., . .rJillf'll
cosl and f)ivrn-limit .<>olution proMfms f'Jmuin,.,f iu 1/li . ., lhr·.o;i,~; r1rr· Jn·opt·ri!J
F pN I'[O(Iogn)]_ha1'd unlc.c;s P = NP.
Proof: Cook's generic recluctiou from d<~dsiou prohh~ms iu NP t.o SAT [C:.J7!J,
Section 2.6] (sec Scc:tion !).J) is a geueric: rndrir: r< ~d1tdio11 fro111 <'VI'I'Y solutioll
101
pro"I(•JJJ iu N fJ M ~1 t.o SOL-SAT. Tlw rt•adc•r cau Vl'rify that the reductions
from SAT to CLIQIJE, VEHTEX COVEH, a11d X:~C [G.Ji~J, Ser.tion :u], and
from t.lu•s(• problc•ms l.o th•· ·i·J;,yJogc•ud.ic iuferi'Jln~ cl<•cision prohlc>rns (seC' Sc>ction
:1.~) ;u·,. also nrd.rk J'(•ductions 1)('1.\Vt'<'ll t!IC' corrc>sponding given-cost and giveu
lirnit. solution prohlt•ms. llc~rJn•, auy siugle-valuc>d function that solv<>s any of
t.ltt• pltylop;<'JJPI.ic iuf<'I'<'IICC' p;iwn-rost and given-limit solution problems can he
usPd t.o ('oJJsl.nwt. a singl<·-valriC'cl fuuctiou tlr<tl. solws any prohlc.•m in N PM\;;,.
To I'Oitlpll'l.t• t.IH• proof, J't'('illl that. N /'AlV11 £;r ppNI'(O(Iogn)) implies P = N P
[Sf'l!ll, 'l'hi'OI'I'nl :t]. I
4.4 Spanning Functions
( ~IIIIIJI.iup; prohl<•ms W<'l'l' first. dcfirwd and studied in (Vali9a, Val79b, Sind77].
This t•arly work rom-id<'r<'d t.h<• numlH'r of (not necessarily distinct) solutions en
l'oclt•d hy a noJJciC'lt•rministk computation, a.tHl has lc>d via threshold-acceptance
m<•dmnisms to t.lw work on probabilistic computation [.Joh90, Section 4]. There
lws ht•t•Jl a n•ct•nt. n•surgenr<• of intC'rcst iu counting for counting's sake [Sch U90,
Tor!H. Wap;I\Hf)a, \Vagl\86h], including th(' counting of distinct solutions [I\ST89),
whirh will In• t.ll<' fuc·tts iu t.his section.
All phylop;t'llt'l.ir inft•n•nc<' .e;ivt.•n-rost and given-limit probl<'ms examined in
this t.lll'sis an· in Spa uP. and all corrrspontling optimal-cost spanning problrms
an· i11 Span( N J> .\1 \ ·~ o I•~ pNI'). At prt'S('Ilt, thl'rt.• arc> no lower hounds known on
102
. . ... -.. ~ - -~ ··..-- ...... , .. ............... ' · ·-- ~... .. •.. ··-· ·~ ····· .... ~ ... -··· .~
tlw complcxitil'S of any of tiWSl' prohiPnu-;. Only tilt' n•tluction l'rolll ('LIQ\IE to
BCC .gin•s a ont•-to-mw solutio11 mappinp;. whirh yic•lc!:-; a nwt ric rt•c\urtinn hc•
twccn the corresponding opt.imal-rost.. gi\'l'tl-cosl, anti gi\'l'n-limit spannin~ proh·
lt•ms. ll<'nr<'. all rharadt•r compat.ihilit.y spauniug prohlt•ms art• hartlt•r than tlw
corn•sponding prohh•rm; for CLIQUE. trnforluJtatt·ly, 110111' of tiH•st• prohh•ms fm
CLIQUE an• known to he ht• hard for t•itllt't' #I' or Sp;1111'. It is inl.t•rt'stillp; t.ltal
only the vcrsious of CLlQUE anti VEHTEX COVEH that. corutl. locally optimal
solutions have ht•eu shown t.o IH' #P-complt'tt• [Val7!}a, Tlwort'tll 1].
Scvt.•ral trivial but. iut.riguing honucls t'llJt•t·gt• for a11y spauuirtg prol,lt•nt X J,,v
applying binary search argunwnt.s. TIH' followiug hold for IIIIW<'ip;ht.c•cl prohlc•IJts,
• SPAN-SOL-OPT-X E , [JSI'AN-SOI,-VAL.EQ-X
e SPAN .. SOJ...-OPT-X E p pSI'AN-~·;()1,-VAI..IA·;-XJO(Iuj.\ll)]
• SPAN-SOL-VAL.LE-X E ppSI'AN-SOL-VAI,.EQ-X
weighted problems,
• SPAN-SOL-OPT-X E F pSI'AN- SOI.- VAL.I.E- X
and for all problems:
• SPAN-SOL-VAL.EQ-X E f[JSI'AN-SOL-VAL.LE-X(1)
Note that if either of the givcu-cosl or giveu-limit. spauuiug prol,l«~ms is iu FPII,
all three problems are in FPII ; however, the: optinml-r.ost Hpa:miug prohlc·m C'all
1 n:~
,,.. itl FPII without. implyiug IIIIYthiug about til(' r·omplc•xity or the other two
prohlc>llls (sc•c• S<·r·l.iou •L7).
4.5 Enumeration Functions
A II t•x isti ng dPiill i tio11s of en IIIIIPrabi lity in complt•xi ty theory ( sre [II HSY91, Sc~-
tiou t!J for 11 n·vinv) are rortC('rtl<'d with <'lllllll<'rating lang11agcs rather than th<.'
rallp,Ps of fuurt.ious for pa.rt.ic.ulm· inputs. The enumeration problem cousidcred
lwn• was riPfitwd lll<HI' for t.he ronv<•uienreof its IIS<'I'S than theoretical tractability;
ltowt•v..r, it. may still IH! of sontc• usc~ in pill'<.' c.omplcxity-thPorctic investigations.
Though may funrt.ion in FPII can he simula.t<!d in PPSPACE(poly), it is not
obvious that auy such fund.ioll rnn he Pllllmeratcd in FPSPACE(poly).
Theoren1 29 Gi11r11 a problem II in F!.% ami a polynomial-lime m·dcrin,q P 011
bintll'!f.~ll'iufl·"• ihr pmbif'm of romfntliu,q flu: 1.·/h oplimalsolulion under P for an
;,.;ftlluT .r ofll is in FPSPA CE(polu).
Proof: L('t N E Ff.~ h<' a polynomial-tinw NOTM transducer that computes
I.IH' solul.ions of II. LC't. p(n) he the polynomial bounding the running time of N
anti assume• that. all solutions ha\'(' h.•ngth p( 11 ). Define the foJlowing function:
RANK(N,P,x,y) = l{wl w is a solution to Non input ;r and w < y unde1·
orclt>ri ng I'} j.
HANK ran lw rumpnlt'tl m FPSPACE(poly) in the same way as functions iu
Spi\nPit {<'orullary l!l, Part (I)). lTl'iug RANK, a binary search can det<.'rminc
10·1
. , ., ,. - - -.. -... -· .. . - .. .......... ·--- -.... .. ~--..... ........ - ~ ···- ... ·-· ..... ..... _..._ ..... - .... - .. ............ _.. ..... _ .... _._....., .......... _~
which of tht' 21'(n) po~sihlt• ~olut ion!' has PXilrl ly (i - I) solutions prt•n•diu~ it
under Pi.<'. t.lt<' i-th ~olut.ion undt.•r /'. Null' that this hinm~· st•arch is 1'1111·
ductt>d on t.IH• ord<'t·inp; of pos~ihlt• solution st.rinp;s unclt•r /'. This pron•clun• is in
ppFI'SI'ACI~(I·(IIy) = FI'S'P:\('f(l)()l,ll)· I
Theorem 30 Gil'ur n pmblrm II 111 F~~ and t1 lmlyllomial-linn orthri11,11 /' on
binary slriu_q.<;, lhr problem of t'OIIIJllllintllflf· ~·lit oplimul:·wlulion rmdr·r /' /11r an
instance .r of II i8 in pp#l,.
Proof: ~lodify NTM N in t.ll<' prt•n•clilll!; proof t.o takt• a:; addit.iunal iupul
a binary string !I and product's only t.hww solut.ious "' such t.lta t. ,, ~ !I und•·r
ordNing P. Pn,of follows hy ohst•rviug t.ltat. Ht\NI\(N,P,x,y) = SPAN(N'(x,y))
and that SpanP ~ F p#f>[t] by Corollary I H, I
Corollary 31 All phylotJf'ndit• injr'T'f'IH'f oplimal-r·"·"'• !l;,,.,,_,.o,.,f, 1111tl .t~it•t · ll-limil
rnumcml i on proiJlrms rmmiufll ill I hi.-; 1/,,.,.,;,., llt't' in 1-' J'# 1' .
The only known )ow<•r homul for I. IH':-w prohh·ms is i 111 pli•·d I •.Y t.lll' ol •s••r·vat.iou
that, for a prohlem X, SPAN-X E F /11~'NifM-X i.r•. 110 l'llllttH'I'<~I.ioJJ prol,l!'lll r·;111
he easier than its associatC'tl spa1111ittg prohlt•trl.
4.6 Random Generation Functions
Though ther<! arc many pap(1rs 011 the random g1~twrat.iu11 of part.k11lar typ•·s of
graphs, ge neral ranrlom-gcne rat.iou prol,lems lraVI! IU'I ~II forrnulat.l'd awl sl.11dil'd
as a d;L~S only iu [.JVVXfi]. Tlw results of the previous section suggest that ran
dour ~r·ur~rat.iou probl('ms are i11 Ffff-1#1'; howc~ver, .Jerrum, Valiant, and Vazirani
haw givr• tr a pron~durr~ of complexity FRPr.~ whirh us<:>s St.ockmeycr's F p"£~ ap
proximation proc·<'dlll'l! for functions iu #P (sec Section 5.6) to generate outputs
of any N fJ JH ~1 machitl<' a.t raudom urub a uniform distribution. Recall that an
N P quc'I'J' ca11 Ill' simulat.('d hy a11 appropriate 1:~ query.
Corollary 32 All phylofJrnclic iufrrcncr. optimal-cost, given-co.<;/, and given-limit
rtwtlolti-!Jt'1U'I'ftlion pmblrm.c; rxamincd i11 Ibis lhc,r;is are i n F RPr.~.
1\ s H II.Y ra ndom·g<'ll<'rat.iou function is also a solution function, the lower bounds
ou t.IH• ro111plr•xit.i<•s of solution functions given in Section 4.3 also apply to
ra rrrlom-g<'llt'l'a t.io11 fuudions.
TIH'SI' results haw an irrevocably academic flavor because they depend on
acn•ss to a sourer• of truly random bits. Though it is impossible to obtain random
hits hy purt•ly arithmetical methods, there arc techniques for generating near
random hit. sc•qll<'ll<"<'S and for Pxpanding random "seed" seq nences into longer
!'a ndom ~t'<PI<'II<'t's. Thr• interested reader is referred to [LV90b, Section 1] for an
int rotlttl'l.ion 1.o matllt'mat.ical d<•finitions of randomness, and [Riv90, Section 7]
for <t ~11111111 ary of llll't. hods for g<'IIPra t i ng random seCJUences.
106
-Optimal-Cost Unwcight.ed J Wl'ight.t.'tl G i \'t'll- ( ~o~t ( H\'t'n- Limit.
Decision - N p -COlli plt'l.l' Evaluation ppNt'IO\Iof!;n)LCJ FPI\"''-hanl t -Solution F flt'-hard, pt'OJ>l'rl)' F pN I'(O(lu~rtll.h ;ml.
E NPA/l' o ppNJ' ,'1 E N J> 1\1\';,
C:panning E Span(N Pl\IV, o ppNr) E Spall P 1-- E J.'J>#I-Enumeration
Random E F!fJ>'f-~
Generation
Table 20: Computational compkxit.it•s of phylopptll'tic inft•rc'fll't' fnndious.
t Most weighted distance matrix fit t. i ug 1waluatiou prohll'llt~ 11.1'(' only known to be properly F pNI'[O(Iogn))_hard (see Corollary :lG).
4.7 Summary
All complexity results obtained in this sPdioll fot· phylogtmdic inf<'l"<'lll't~ probl,•nts
are given in Table 20. Optimal-cost solnt.ion problems a.n• prova.hly hnnl.-r than
the corresponding given-cost and given-limit Holutiou problems lu~<'alliW of tht• N P
queries allowed to optimal-r.ost problems. However, this dilfi'WIIC'<' S<'<~lliH t.o dis-
appear for tnorc complex versions of these problems. 'l'ltongh tit is dilfl~l'l'll<'<~ lllii.Y
re~assert itself when completeness re.•mlf.s are avai lable, l.lu~ n•lat.ioJJs J,dw('<'ll t.lw
spanning versions of these problems suggest, ot.lwrwise (set~ s,~dirm ~.1). I fOil-
jecturc that for problems more complex than wmputing solut.ioJJs, opf.irnal-cost.
problems are easier than their corrcspondiug givc~n-r.osf, a11d given-limit. probi<!IWI.
Solution problems arc of greatest. iutcrc!sf, t.o hiologist.s, itS t.lu!S(! pro!,lellls ilr«!
concerned with the trees that dcfiuc evolutionary hypot.lwsl's. Morf!ow~r, t.lwy
107
ill'(: t.h(! only proi,Jmus that have IJ<!CII investigated in the literature, albeit by
assc•ssing part.klllar algorithms solving tlwsc! problems [LPSTJ, Pla89]. Several of
l.l1ro otlwr proj,kms 1.J'Nited abow also have biological applications. For instance,
spannin_g: n!sult.s give low!'r hounds on the mnning time of branch-and-bound
algorithms that solve tlu~ r.orr<•sporHiing solution problems [Sto85, Val79a). Also,
iiS <'<lf'h phylog<'IIY iworpora1.es a different hypothesis of character change, all
sll<"h hypot.IIC'sc·s should he cousidt•rwl to get an accurate idea of what is implied
11hout pltylogc•uy hy a particular data set [Mad9l, .MHS92], which could be done
by <'IIIIIIIPI'a.l.i II.(!; a II phylogenies.
Tlw rc·sults giwn in this section do not directly put upper or lower bounds
o11 t.hc• l.inH· romplt-xitiPs of algorithms solving these problems; at present, it is
ouly kuowu that tlws(' prohlcms, hy virtue> of being in FPSPACE, can he solved
in l'XJHIII<'III.ial t.imc•. lloweVl'l', tlwst' res11lts do give the relative hardnesscs of
t.lu•st! prohl<'ms, and may sugg<'st guidelines for algorithm designers about whic.h
appmacht·s may uof ust'ful for· solving these problems.
108
' • • ,,... - · ... . ..... ... ,.~. ' l" ·~ · ••• ~ ... . .. , .. \." ""''~' .. j.'• .~ ........... ....... _...
5 The Approximability of Phylogenetic Infer-
ence :Functions
The results in prPvious sections suggrst. that polynomial-t.inu• alp;orit.llllls provid
ing cxacl :mlutions for phylogenPtic inft>r<'IJ<'<' opt.imal-co:-;1. prohlt•11rs prolmhly do
not C}:ist.. However, fast algorithms may <·xiHt. if orw is willinp; t.o s<'l.t.lt• for t~pprox
imatc solutions whose cost is withil1 sonw fixt>d iukrval or r.rt.io of t.IH• opt.irnal
cost. In t.his section, I will derive sonw limits 011 t.ht• typp:-; of approximatiu11s
that arc available to phylogenetic infer·enn• prohlt•II\S,
5.1 Types of Approximability
This section gives a hrid overvi<'w of typPs of approximat.iou al)!,orit.hrns atrd sotn<'
class-based approaches t.o proving t.lml. various of f.lws<~ t~pproxilllHI.ions nnu<ol.
exist for a given problem. For iu-<h•pl.h reviews of f.opks irr this Sl'l'l.iou, Sf'<'
[B.JY89, C:.J79, HS78, Mot92].
Given a problem X, all i nstaurc~ I of X, and an approxi rnal.io11 ;dP,orit.lun 11 x
for X, let OPTx(l) he the cost of tire optimal solution for/ , Ax(/) lw tlw C'usf.
of the solution for I found by Ax, a!HI MAX x (I) he f.lu ~ largest. of tlw rosf.s of
all solutions of/; further, let Y = OP'I'x(l) if X is a miuirui:w.l.io11 pw!Jieru, illld
Y:::: Ax(/) if X is a maximization probh~m. Tlu~n~ an~ sPve:;al rrwa.suws of t.IJe
quality of an approximation [OM!JO, p. fiJ:
I fJ!J
• Ab.<;o[ulr: !'J'rmr Mr·a,<;un; flrt(l) = !OPTx(l)- Ax(!)!
Th<~ clilr<~r<·ul. Y an~ appli<~d to map the error-measure values for minimization
and nmximiza.t.ion prohlc>tns into the same' interval, namely [0, +oo ), for easier
colllJHt.rison (C.J7!), p. 1::!8]. 'l'hcr<~ are several types of approximation algorithms
ddiltf'cl hy VHrious houuds 011 the quality of the resulting approximations:
I. Ahsolut<• ( Adtlitiv<') Approximation
• IOI"f~v(/)- Ax(f)l ~!(Ill)
~. Polynomiai-Tinu~ Approximation Schemes
• Polynomial-Time Approximation Scheme (PTAS):
For a.!l 1.· > 0, there exists an algorithm Ax such that IOPTx(/) -
Ax (1)1 ~ t Y ami the runtime of Ax is polynomial in III for each 1.:.
• Fully Polyuomiai-Time Approximation Scheme (FPTAS):
For all 1.: > 0, there exists an algorithm Ax such that IOPTx(/) -
Ax(!)! :5 tY and t.he runtime of Ax is polynomial in III and t·
'l'hP algorithm Ax nm he either a single algorithm (uniform PTAS) or a
family of algorithms ( 11011-1l1liform PTAS).
ll 0
3. Rclativt> (1\lultiplirat.ivP) Approximat.iotl
• IOP7\(/)- Ax( I)! :S cY,c > 0
In the following, "a polynomial-time algorithm with a n•lat.ivc• (an cthsolutl') fiJI·
proximation c" will bP abhn•viatPd as ''a n•lat.ivc· (an ahsolttt.c•) approximation t•".
There arc sevcml variants on tht>sc• lkfinit.ions in t.lw lit.t-rat.un· \.hal. «lt't' nc•at.C'cl by
using different error mcasun•s or implyiup; asymptotic· ratlll'l' t.hatl ahsolut.t• t'rt'OI'
bounds. One such variant (indeed, tiH' ftr:;t. (.Joh 7•1] ami prPfPrrc•d uota.t.ion) n•p-
t 1 t . . t' f • I f t' I . Ad I) ( • . . . I'CSCII S rca IVC apprOXIllla lOllS itS 8 ,J'il!g I dOI' \VCIJ'( l'ilt.IOS OJ'},\:{/) lllltlttl\I~Ht.to\1
problems) and 0~~Ml) (maximization prohl<•ms). As sonw of t.lwsc• cll'finitions
arc not equivalent and it is uot always ch•ar whkh dc•fiuit.iott is lll'inp; IJsc•d, till'
reader must exercise cautiott iu compal'iug rc•sult.s front dilrc•n•ut. solll'c'c•s. In this
thesis, all approximability definitious and rc•sult.s will he• plll'ilsl'd as ahuvc• in
terms of the absolute error measure, lwcause (I) this IIH~it.'illr<' unilic•s t.hP t.ltJ'('I'
types of approximation algorithms dcr;crihed above, aud (2) this IIH'iiSIJI'I' is till'
formulation of choice !n the proof tedmicpws [ALMSS!J::!, 1\n!XH, PY!JijtJs<~d 111
this section.
Traditionally, the theory of approximaf.iou algorithms has lu~<m (:OII('I!I'!wd
with proofs that certain types of approxirnahilit.y did not c~xist for part.ir:ul;u·
problems, with approximation-preserving reduct.ions, aud with rwcc~ssary arul suf-
ficieut conditions for the existence of various approxirnatiou algorithms for a give11
problem; see [B.JY89, HS78, G.J7H, MoW2] for a rt!view of this work. Witltiu t.h(!
Ill
last four Y~'<1.rs, two approaches ltit\'c emerged that arc based on hierarchies of
approximahility dass1~s:
I. 'J'ft,. Af.qorilltmir· Appmximnbilily 1/ir·mf'chy [CP91, OM.fJO}: Define the class
NI'O of all NP optimizatiou problems, aud its subclasses FPTAS, PTAS,
aud APX consisting of all problems that have FPTAS, PTAS, and rcla-
t.iw approximation algorithms, r<'spectivcly. Orpouen and Mannila [OM90)
dl'litw•l NPO, and showetl that several problems are NPO-complctc un-
d!'r an•latiw-approxima.tiou preserving reduction. Crcccnzi and Panconcsi
[CP!H] d<'fitu•d FPTAS, PTAS, ami APX, and gave artificial problems that
;u·<' wmpll'l.<' for PTAS and APX. lt. is known that FPTA8 c PTAS' c
AJJX C N 110 unless P = NP [CPHl, Theorem 6], and that a problem that
is luu·cl for a particular class rannot have an approximation algorithm from
a. loWI'I' dass uul<•ss P = NP.
2. 'l'hr· Lo!liral-Form Appmximabilily llicm1'chy [A'T90, 1\791, PR90, PY91}:
Tit" algorithmic approarh to defining approximability does not give in-
sight int.o why prohlems ar<' approximable [B.lY89, p. 220); moreover, it
is not cll'ar how one definl's it notion of ''approximate computation", let
almw dal'\s<'S of such computations, using the Turing Machine encoding of
prohh•ms [PYnt, p. 42()]. Building on the work of Fagin [Fag74}, Papadim-
it.riu11 and Yannaka.kis [PY!H] initiat('d the study of approximability classes
t.hat do not. involw romput.a.tion - that is, classes of approximable prob-
112
·> ' , •
lems defined hy the syntactic ~>t.ruct. un• oft IH' lo~k formulas t.hat. dc•scrilw
the solutions of thos<' problems. !\I any classc•s ha\'<' lwc•tl tlt•rirwd ill this
framework [KT90, I\T91, PRBO]; only l\IAX NP arul ~lAX SNP will lw
described here. Fagin showed that t.hc• class NP could he• rt'lll'c•sc•ntt·d i11
logic as the class of prohl<·ms whose• solutions s· (';Ill lw c•xprc•sst>d hy fol'lllll
las with structure 3/i'V.!'3y¢J(.r,y, S') whc•rc• 4> is quaut.ilil'r fn•c•. <:i\'t'tl !.his
formulation, Papadimitriou and Yanrmkakis dditwd MAX NP as t.llt' c·lass
of problems whose solutions have tiH' fmm ma xsl{ .r: I'3.'Jtjl{ .1·, ,11, S)} I t.ltitl.
is, problems whose solutions 8 satisfy tlw maximum llllllllll•r of dilfc·rc·nt ,,.
rather than all of them. Papdimitriou nud Ynnnnlmkis nlso dc·litll' suhdass
SNP of NP of the form 3S'r/:rr/J(.r:, S') and suhdass MAX SNP of MAX Nl'.
The formulation of SAT, t.he hoolc•au fonnula sa.t.isfiahilit.y prohll'llt, in c'a"h
class is given in Tables 21 and ~2.
Class MAX SNP will be important. later in this sc•d.ion, as will l.lu~ followill).!;
reducibility.
Definition 33 ( [PY91], p. 427) Lei. II nnd II' br: lwo oplimi::alion (maximiza
tion or minimization) pmblcm.~. Wr: tmy thai. II !...-reduces lo II' (11 -;S,,, ll') if 1/u:rr:
are two polyTu~mial-timc algOT·ilhms J,g and cmu>lftnl.'( n,{l > 0 suf'h. lhfll. fo7' f'ru·h
instance I of n:
(Ll) Algorithm J produces an instance I'= f( I) of If', .'>?u~h. lh.al llu: optima of
I and I', 0 PT( I) and 0 PT(f') , rcspccli11cly, satisfy 0 fJ'I'(I') $. a 0 fJ'f'(l)
II :J
SATE NP
Instance: Boolt>an formula 8 in CotJjunctivc normal form i.e. clauses composed of variab!Ps linked by disjunctions (logical OR), which arc linked by conjunf'tions (logical AND).
Formula: 3'/'V(:3.r.[( P( c, ;r.) 1\ :l: E '/') V ( N( c, .r.) 1\ -{r. E T) )J,
whnc~ P ami N encode the instance 8 ( P( c, ;r.) means that variable a: appears urJnegatcd in clause c of S; N(c,;r) means that variable ;r appears negated in clause c of S') and T is the set of true variables c·orrPspotuling to a particular assignment for S.
MAX SAT E MAX NP
Instance: Booh•ar1 formulaS in conjunctive normal form.
Formula: max·r l{cl3:r.[(P(c, ;r.) 1\ ;r. E T) V (N(c, x) 1\ -{r E T))]}l
whc•n• P, N, and T arc as defined for SAT.
Tahlc• ~1: Formulations of SAT in first-order logic (adapted from [KT90, PY91]).
114
,..., J~, .. - ..... ,.., .o'Uo~• • •- ... ~ .. ~ · I'' '" • -•• - ........ . ,,-.. ...... -.,,~ 11'< '"'• • ' •- ' "' ~., ... ... , .... , . ,., . .. , ,, .. •, h oao ,, .... ., , , .,...., __ .,. .. ~.....-.,, .. -..~ ...... - _ __.._._...,..,111• .. --_,....,...,.,...._H"'/C',~
3SAT E SNP
Instance: Bool<·an formulaS in coujuncti\'t' normal forlll, in whic·lt 1'/H' h dmtsc• has at. most :1 variables.
Formula:
( (.r" .1·2, .r3) E Co --+ .1:1 E TV .r2 E '/' V .1·:• E '/') 1\
((:t:" .r.2, .1::,) E C1 --+ ·''• ¢. '/' V .r2 E TV .r;1 E '/') 1\
((.r" .r2, .ra) E C:2 --+ ·"• ¢TV .1'2 ¢ '/' V .1':1 E '/') 1\
((;r" :r.2, .r3) E C':, --+ ·''• ¢. '/' V .l'·z ¢ '/' V .1';1 rJ '/') j,
where G0 , G" G2 , and Ca encocl(• t.IH' inst.a11n~ S k = {.I'J• ·''2•·'':d E (.'1
means that variables :r.1, ... ,;r:i an• 1wgat.(•d and variaJ,It•s .r1 t 1, ... , .1·:,
are unncgated i!• r.lans<' e of ,C,') mHI '/' is t.lu• s('t. of t.ruc• vari11J,h•s c.orrcsponding to a pa.rtir11lar assignnwnt. for S .
MAX-3SAT E MAX SNP
Instance: Boolean formula 8 in conjunrt.ive tiOI'IIHtl form, iu wltid1 c•adt dausc• has at most :3 variables.
Formula:
(( :r .,:r.2,a::,) E q,--+ :r. 1 E 'I'V .r.2 E TV .t:1 E '!')/, ((xa, :r2,:r.:1) E U1 --+ :r.1 rf.'I'V :r.·l E 'I'V .r;s E '!')/\ ((:r." :r:.l, :r.a) E U2 --+ :r.1 ¢'I' V :r.2 f/:. 'I' V .r:t E '/') 1\ ((:r.t, :t2, :ca) E G'a--+ :r., f/.'I'V ;1:2 f.'/'V ;t::t 't 'I') J }1,
Table 22: Formulations of SAT in first-orcl(!r logic (c:ont'd from Tal,f<~ :ll ).
115
(L2) (,'im·n any solulion of I' with r·osl c', algorithm !J ]Jt'nduccs a solution of 1
milh. t'O.'il r: sul'h lhal lr:- OPT(I)i :$ f31c'- OPT(I')I·
L-n~dudbllit.y is dosely related to most of the other defined approximation
pn·sNviu.e; n•dncihilit.h•s [CP91, OM90]; indeed, a constrained L-reducibility ap
plinthl<~ t.o pairs of maximization or minimization problems was defined inde
JH'IHI<•ut.ly by II. Simou [Simii~H]. Following Simon, 7' = ofJ will be called the
r·;rpnn:·;ion of a giv<'ll L-redudion.
Lemma 34 ([PY91], Proposition 1) D-rcdudions compose.
Lemma 35 ([PY91], Proposition 2) If 11 S1, ll' with cxpanswn r, and n'
/w,o.; (I rflrtfi-llf' U]JJli'OJ'imation f 1 then fl has a rclafivc app7'0X1:maf.ion 1'L
Corollary 36 lfl I 'SL f1' with c.rpansion r, and "n has a relative approximation
l'" =}X, th.rn "II' has a rclatirJf~ approximation;"=} X.
Not.<• t.ha.t. n•st.rirtiott n•ductious arc trivial L-rcduct.ions in which a = {3 = l.
Two following t.wo thco•·cms give hounds on approximability using the rela
t.ion:-;hips mnong classes in the function and language bounded N P query hierar
<'h it•s:
Theorem 37 ([WagK89], Corollary 16) No evaluation problem A such that
lilt' mrrr.o;;po11din,q decision problem Aodd i.e. ''Is OPTA(X) odd?", is pNP{O(Iogn)L
ltfl1'd ('lUI ltn!lf 1111 nbsolttff' np]H'o.r.imafion ~ O(log n) unless pH= e~.
116
Theorem 38 ([Kre88], Theorem 4.3) For· n•n·y OJIII'{.f(u )j-lwrd n•altwlifllr
problem, .f(u) i,~ .'Wwolh and .f(u) E O(logu). lhrrr r·.ri.-;f.-. t '2: (} ,-;ul'h that 1'1'1'1'!1
absolute appl'orimafion 11111.'~1 hat'f Mlur ~ 41/(u') injinildy o/fru .
The proofs of each of tht•sc.• tlwort•ms noll• t.hat <Ill ahsulut.t• approxirnat iou al
gorithm reduces thC' range.• in which 1.\H' cost. of t.lw opt.i111al solution liPs, and
that a lmfficicntly rcdnrl'd rangt> may ht• st•ardwrl wit.h ft•wl'i· N P qrwl'it•s t.han arr•
required to solw lht• evaluation prohlt'lll. 1\r<•nt.PI's t.ht•ort•m will lw llsl'rl in t.l11•
following sections. This tlworem implic•s almost. Hll known rount•t·t.ions hi'I.Wt't'll
approximahility and t.hc fund.ion houtldt•d NP ti'H'I'Y lli<•l'at'<'hy.
5.2 Absolute Approximability
There arc many elegant proofs which show t.lwt ahsolut.t• approximat.ious do nut.
exist for particular problems [G.J7!J, IIS7H, WWHfi]. llowt•vr•ro, swh n•srllt.s ntn
also be derived for classes of prohlerns.
Theorem 39 The followin,q hold:
I. No OplP{c log log n + 0( 1 )}-hard r:valualion 1n·oblnn f'flll hmu· tm ttb.•wlult·
approximation c :::=; o(logn) lllllr;!iS P = NJJ.
2. No OptP{O(Iog n )}-hard evaluation pmb/r;m t'tm hrwr· tm flb.•wbdt: rtppm.ri
mation c $ o(poly) unlc.<;s P = NfJ.
117
,'J. No V/'~1'-lwrd r;valuafiou pmblr:m ran havr· au ab.c;o/ulr: approximalirm c ~
O(Jm/:tJ) uulr·.o;,<; II. = N P wul Pr·wP = N JJ.
4· No o,,,,,_('07flplr:/(: mmlualiou problr.m. mn halJe au. absolute approximation
(' ~ O(Jmly) unh·s:; I'= NP.
Proof:
h·oofo; of (I ::!): Follows from Theorem :!8.
/'roof of (:1): Follows from (St'IHI, Tlworcm 12] and [Scl91, Corollary 4(ii)].
!'roof of (1): Follows from (l\re8H, Tlll'orcm 4.1]. I
Corollary 40 '/'h.r following hold:
I. No l'luLmdf'r f'ompalibilily, WI weighted phylogeuclic parsimony, or unwcighl
nl tli.o;Jaua-malri.r. filling oplimal-cosl solution problem examined in this
th,.,o;;,o; /ta,o; flll alJ!wlulr appro.r.imation c $ o(poly) unless P = NP.
t!. No ll'f'i,qhlnl phylogruclic pa1'!~inwny oplimal-cosl solution problem e.r.amined
iu lhi.c; lhrsis ha.'i an absolute approximation c ~ O(poly) unless R = NP
aiHl I' = fi'rwP.
:1. Nour of SOI.~-MIN-FU/!7[FJ], SOL-MIN-FUUT[Ft, ?:.}, or
.WJI--MIN-Fl!U1[?:.} have au ab.~olulc approrimalion c $ O(poly) unless R
= NP and P = FrwP.
118
4. Nonr of SOL-M IN-FUDJ/FJ]. SOL-.\1/X-Ffi 1>'/[f.'j. ,I.,'OI.-.\IIN-I·'l:I'T{F1}.
SOL-MIN-FUDT{f2}. or SOL-.\1/N-/·'(i{!'!fl•;. ?.} IHII't' till llb.·mlrth llflfll'o.r
imafion c $ o(poly) unlr .... -. I' == N I'·
Note that r('snlts 2 and :J in Corollary ·10 can also IH' dt•rivt•clusin,l!; t.ht• p<uld<~hil
i ty of tlw associatPd PValuation prohlc•ms and t hc•ort•Jns from [ N ig;i!i] ( sc•c• a lsn
[WW86, Section 9.1.2.1] ).
5.3 Fully Polynomial and Polynomial Time Approxhua
tion Schemes
Consider fully polynomial tinw approximation sdtc•mt•s. ( :an•y ancl .Johnson
derived sufficient conditions for FPTAS non-appwximabilit.y usin,l!; till' nut.iou
of strong NP-complctcncss. An NP-rompldc~ dc•dsion probl«'lll is .o;/mn,qly N/'
complclr. if it has an NP-completc subprohlc•m in which allllllllliH'rs arc· luJJ!IJdc•rl
by some polynomial of the instance length [C:.J7n, p. 9!)) . No solution pi'Ohlf'lll
whose corresponding decision problem is st.rougly NP-compld.c~ and whosP op
timal cost is polynomially bounded cau haw illt FPTAS lllllPss P == N P [( U7H,
Theorem 6.8 and Corollary). These conditions are sat.isfi(~d hy alluuwdgltl.••d phy
logenetic inference optimal-cost solution problmns examirwd ill t.!Jis t.lwsis. c:aJ'(~y
and .Johnson also defined pseudo-polynomial redudions, wltidt pn~s•~rv•~ sl.roug
NP-completeness [GJ79, p. 101]. The reader can verify that. all wdud.ious givm1
in Section 3.2 from unwdghtcd to weighted problems am also psi!Udo-polyuomial
IHJ
n•tludionsj IH'IWl!, all Wf'ighted der.ision problems PXamined in this thesis arc also
!-il.r<mgly NP-complet.e. These• same recluc.tious also map into subproblems of the
solut.iou prohl(•ftls COJTl'SJ)(HJtliug to these weighted problems whose optimal costs
an• polyrwruially borutdl•d.
Theorem 41 No phylotJf· ndir~ iuj('f'£:1/.cr. opfimal-cm;l ~rwlulion problem. examined
in lhi.o; ff,,.,c;;,.., lw.c; tlll FP'Ii18 tw/r.c;,'l P = Nf'.
( :orrsitlcr polyuomial time approximation schemes. The traditional approach
l.o PTAS IIOII·ilpproximahilit.y involvc•s proving that the given problem cannot
lraV(' <t r<"l<t.f.ivc• approximation c, c. > 0 unless P = NP [IIS78, G.J79, WW86].
Sudr proofs typically derive contradictions by using polynomial-time graph ex
pansious to "ctmplify, relative approximation algorithms such that cost-restricted
NP-n>lltpldc• suhprohlems l'an he solved in polynomial time. However, few
prohlc•ms ha.v<' tiH' cost-restricted NP-completc subproblems required by this ap
pmadt. A morl~ wid<'ly-applicahle technique has recently emerged from the study
of int.«•t·act.iv<' proof systems (see [.Joh92] for an insightful review of the results dc
srrilu•d IH'Iow). In what was initially thought to be an isolated result, Feige et
al. (FGLSSHI] showed that no constant relative approximation c, c > 0 exists
fur SOL-1\lAX-CLIQUE unless NP C DT IN/ E(nloglogn). This result has been
clr<uHat.irnlly improwd by Arora ct al. (ALMSS92]:
Theorem 42 ((ALMSS92], Theorem 3) If there is a PTAS for SOL-MAX
,'IS..\ T lhrn P = NP.
120
This result is significant because' SOL-t\IAX-:JSAT is romplt'l<' for t\IAX SNP
[PY91, Theorem 2], and many problem:!' ran lw shown l\IAX SNP-Imnl \'ia L
rcductions.
Theorem 43 (Proposition 2, [ALMSS92]) Tht'l'c don• uol I'J'i.o;/ (l 1''1':\8 for·
any MAX SNP-har·d pi'Oblcm rw/rss P = Nl'.
Several MAX SNP-hard problems of JHU-tkula.r illl.t•r<•st. art•:
• SOL-MIN-STEINER TREE IN GHAPIIS [BPS!J, Tlwon·m '1.~].
• SOL-MIN-VERTEX COVER.-B, in which t•ach vt•tl.t•x in t.IH• p;iwn p;mph
has degree :5 B, and SOL-MIN-VEHTEX COVER [PY!>I , Tlwon•nt ~(d)],
• SOL-MAX-CLIQUE (CFS91, Theorem 6], and
• SOL-MAX-X3C-B, in which each clenumt. in the given sd on:ms in :5 IJ
3-sets, and SOL-MAX-X3C [Kan91, Corollary tl}.
Using these problems, it is possihlc to show many of tlw prohlt~tlls <~x;unitl<'d i11
this thesis to be MAX SNP-hard. In the following, defiue a mnonim/ ... o/ulion for
a problem X as a solution to an instance of X produced hy t.lu~ l't~dw:t.ious giv<!ll
in Section 3.2, e.g. the canonical trees for FUGT(~).
Lemma 44 The following hold:
1. Given a solution W of cost c to an inslrmcc of ,<j(Jf_rMIN-ff/J(,'(,~',' or SOir
MIN-UBQC8 derived by the reduction fmm VB/i'/'BX UOVI~U t;im;n in
121
{lJ.JSHfi] (s('(; Tablr· .9), in polyumnialtimc we can fiwl a r:anouical solution
W' with r·ost r:' ~ c.
:.!. Uivr:n a solution W of co.'il c to an inslaucc ofSOL-MIN-UBCDo or SOL-
MIN-(1/J(J/Jo rlr:rivffl by 1/u: t·cdudion from VERTEX COVER given in
f/J.ISHfJj (.o;r·r· TaMr: .9), in polynomial time wr: can find a canonical .c;o/ution
W' with msl r:' ~ r:.
:1. (,'illr:n a solution W of r:m•l c to an iu.<;/anr.c of SOL-Ivi/N-UBCC/ or SOL-
MIN-ff/J(JC'I dr.ri11crl by the rrduction from VERTEX COVER given in
{/JS87} (.wT Tnb/r: 10}, in polynomial lime we can find a canonical solu-
lion W' with r'o8t c' ~ c.
1· Giur11 n ,o;o/uliou W of cost c to an in.c;fance of 80D-MIN-FBUT2[Ft} de-
l'ivnl by/he rcdudion fi'Om X:JC g1:vcn in [I\'A-186} (Sec Table 17) , in poly-
IWIIIia.llimr wr r·an find a canonical solttlion W' with cost c' ~ c.
.r,, (,'ir,r·n a :wluli011 W of cost c to an instance of SOL-MIN-FUDT[Fa] (a E
{I,~}) flrri11cd by lhc rcduclions from F:JUT2[a] given in [Day87} (see Table
!?), in polynomiallimr we ran jiud a canonical solution H' ' of cost c' :5 c.
Proof:
l'mof of (I): If c > 2IXI, then replace W by the tree 1V' in which each member
of .1· E .\',.l' = {lli,l'j} is co11necled to 0 by edges {{vi, vj},{vi}} and {{vi],O};
t.his lrt•t• is canouical. and has cost c' < c. Otherwise! create H' ' by trimming
122
'i -:~ . j
,. ·~ ~
:.l ·.:., ..• .' -l
J •,
I ,. .I
l
lV to remove all leaf vcrlict>s not. in .\, and applying t.ht• t.rt't' t.rmtsformat.ious in
[DJS86, Lemma lJ to the leaf farthest. from 0 until tht• t.n•t• is fanouiral. Tlw~w t.n•p
transformations do not incrcast• tr<.•e rost.; mon•ov<·t·, as <'adt such t.rausfortn11t.iun
removes at least one non-canonical wrt.cx and t.ht•n• CCIII he a t most. t'- (lXI + I)
such vertices, this algorithm is pol.ynotnia.l t.inw.
Proofs of (2- :1): Analogous to t.ha.t. for (I), using t.hl· t.n'l' t.rausfurmat.ious iu
[D.JS86, Theorem :3] ( Dollo) and [DS87, Lt.•mnHt ~ aud TIH'ot't'lll :J] ( Cltrontosouw
In version).
Proof of (4): CrC'ate W' as follows: if tlww a.n• p11rt.it.ions t.ltat. p;roup vc-rl.ic·c•s
not connected by edges iu the creat<•d graph (,', t.lwu hn•<tk all SIIC'It p<trti t. io11s
into partitions that only group wrt.in•s contwd<'cl hy t·tl~t·s in U. For t'ill'h group
of vertices corresponding to a suhgraph U,., if t.lu~ 11('orflr•rs" { :r., , 1 , .•·,. .~, .,. ,. ,;i)
arc all included in partitions of Grr, then replact~ t.ht• st•l. of part.itious for t.lw
vertices of Go with the four triangle-partition:; in t'lfllill.ion G; <'Is.-, n•plan· wit.h
the three triangle-partitions in equation 7 ami t.ltt• appropri<tl.t• sullsd or siuglc•
vertex partitions drawn from the set. {;t,v,1 1 ;1:n,:l,;r.r.,a}. Tlu•s!' t.ra.nsforlliitl.iow; rio
not increase the cost, as these triaugk'-part.it.ious an! opt.imalutHic!r t.lw /•', st.at.ist.k
(sec Section 3.2.3); moreover, as there ar<' only ICI such p;roups l.o l.mnsfonu , l.lw
algorithm is polynomial time.
Proof of (5): Create tree W' by applyiug t.ht! tree t.rausfornHttiolls iu {l>ayX7,
Proposition 3) to W to create a discretized additive~ t.n ~c ! W" wit.J. <ut ult.t·<urwt.ri• ~
subtn~e W(J, and theu applyiug t.he transformation in [Day87, Proposition 4] to
reduce W(f to au ultrametric subtree of height 2. These transformations do not
ill<T(!ase tit(~ r.ost of the tree, and can be performed in polynomial time. I
Theorem 45 7'/u: following hold:
I. HOlrMIN-VeH'I'BX CO\IEH-B '5:1. 801..~-M/N-UBCCS and
SOL-MJN-UB(JC~'
2. SOL-MIN- VERTEX COVE!l-B '5:D SOL-MIN-UBCDo and
SOL-M IN-l!flQ/Jo
:l. ,C.,'OL-MIN- VEU'J'HX COVBR-13 '5:1. U/JW.
4. SO/rMIN- VERTBX COVER-B '5:D SOL-MIN·UBCCI and
80/.,-M I N-1 I BQCI.
5. S(J/.,-J\1/N- ~'ERTEX CO VER-B '5:L SOL-MlN-UBGc.
fl. ,)(J/.J-11/AX-CLJ(Jl!E $L SOL-MAX-BCC.
7. SOL-11/AX-/JCC ~~- SOL-MAX-BQC.
8. SOL-MIN- VHU1'EX COVER-B ~L SOL-MIN-F'UGT{?.] .
.tJ. ,c.,'OL-!1/AX-X:JC-U '5:I.J SOD-MIN-FBUT2[Ft}.
10. SOL-MIN-F/3/IT!!{FrJ '5:L SOL-MIN-FUD1'[F('(} (o E { 1, 2} ).
124
Proof:
Proof of ( 1 ): Consider the reduc.t.ion fmm \' C t.o C: S' ( = tJ HCCS aud tJ BQCS)
given in [DJS86] (see Table 9). As 0 l'T\'Cu ;::: I Bl/ H ami 0 JJ'I'c·s :5 ~11~'1. tht•11
0 PTcs :5 280 PTvc8 , satisfying condition (L I) with n = '2H. J\:-; hut.h pruhiPms
are minimization problems, condition ( L'2) ran lw n·writ.t.t•n a~ cn~11 :5 () V/'n·, +
{3( ccs - 0 PTcs ). For any canonind solution for U BCCS, n •c·11 = c·r .•s - lf•JI;
moreover, such a solution is guara.nt<.'<'d by Part I of Lt·mnw. ·1·1. St'tt.inv; {~ = I
makes condition (L2) equivalent to cvr.·11 :5 rr:s- ! E'l· llc•un-, t.hiH l't'durl.ion is
an L-reduction.
Proof of (2): Consider the reduction fi'Oill \IC l.o Do(= liBCI>o aud IJBQI>o)
given in [D.JS86) (see Table 9). As OPTvc:11 ~ lEI/ /J, OJJ'f'tJo 5 :ij\lj +~l/~1. and
lVI :5 lEI, then OPTvo :5 5BOI'Tvr:111 sat.isfyiug romlil.iou (LI) with n =fill.
The remainder of the proof follows that for (I), suhstit.ut.i11~ t.lu• appropl'iai.P part
of Lemma 44 to obtain f3 = l.
Proofs of (3 - 5): The proof for (:J) is ideilt.ieal to ( 1 ). The~ proof of (•I) is
a varianl. of that for (2) which uses the r<!clur.t.iou giv<~ll in Tahle 10 to yidcl i\11
L-reduction with a = 58 and f3 = I. As I.Ju~ Gmwralized pMsilJiouy aitc!riou
can simulate any ordered phylogcudic parsimouy probl(!rtJ, {!'i) cau be! provc!d hy
a variant on any of the proofs for ( 1 - ~ ).
Pro-:ifs of (6- 7): By the reductions given iu Table 1-1, solutions to SOL-MAX
BCC (SOL-MAX-BQC) yield solutions to SOL-MAX-CLIQUE (SOL-MAX-BCC)
12!)
of tlw sanw c:ost .. lienee, these reductions yield L-reductions with a = /3 = 1.
l'mof of (H): The reduction given in [BP89] which shows the MAX SNP
!JardrH~ss of SOL-MIN-STEINER TREE IN GRAPHS is actually from SOL
MIN-VEHTEX COVE.R-B t.o SOL-MIN-STEINER{I,2), a version of SOL-MIN
STI~INER TH.EE IN G Ri\PIIS whose input is complct.c gra}Jhs with edge-lengths
E {l,:l}. llowPwr-, SOL-MJN-STEJNER(I,2) is a subproblem of SOL-MI N
FIJOT(~J. As all solutions to any instance of SOL-MIN-STEINER(1,2) will
satisfy tlu~ dominance condil.iou, SOL-MIN-VERTEX COVER-B L-reduces to
SOL-M IN-FIIC:T[~) wif.h o- == 2B and /3 = I.
l'mof of (.9}: Consider the r<.•dudion from X3C to F'BUT2[F,] given in [KM86)
(s<·<~ Ta.hle 17). Note t.hat iu a B-hounded instance of X3C, 3(8- 1) + 1 is the
m<~ximum llllllllwr of 3-scts that can share one of the values of a particular 3-set.
llenn·, tlw sel< ~c.t.ion of any :J-set call prevent the selection of at most 3( B - I)
ol.lu•r:J-st'l.s inC; thus, OP'l'X3c8 ~ ICI/(3(8-1)+1). As OPTFBUT2[Ft) s; IE!,
1111d lEI = 2JICI, then OPTFBUT2(Ft) s; 21(3(8- I)+ l)OPTxJc8 , satisfying
nmdit.ion (LI) wit.h a= 2I(:3(B -I)+ I). As X3C is a maximization problem
mul FBlJT2[/t'J] is a minimization problem, condition (L2) can be rewritten as
cxac·11 ~ Ol''l'x3c11 - (3(cFHli1'2(F!)- OPTFBUT'l[Ft])· For any canonical solution
for Jt'HllT2[Fr]. cx:1c11 = (lEI- CFBUT2[F1J)f:3- 3ICI; moreover, such a solution
iH guaranl.l•<•d hy Part. 4 of Lemma 44. Setting /3 == I /3 makes condition (L2)
<'quival<•nt. to cs3r·11 ~ (lEI- cFBUT:l[F1J)/:3- 3ICI. Hence, this reduction is an
126
""\''~,., ~ · ·r-no\ ~,. •.• ,, - ... , •• .. _ .. . . ... .. ~ .... , ~~ .... .--... • ..,....,.-..,..., ... _.,...,,.._ • .; , • .-~ ...... , . ...... ... -~...-~,,_.._............_,o1(,...,,,..._._,. ___ .,.,_._. ._,,_~......,...
L-reduction.
P1'oof of (10): Considct· the reduction from F'BU'I'2[/;:t] t.o FUJ>T[t·:t] o E
{1,2} given in [Day8i] (sec 1~1blc li). As OPT/o'Hil'l''l(I·;,J = O!''f',.·llll'I'(I•:,J• condi-
tion (Ll) is satisfied with o- = I. As hoth prohl<•ms an~ miuimiimt.iou prohkms,
condition (L2) can be rewritten as CJ-'f1H7'2(1·:,) $ OI"J', .. ,,H7'2(1·:,) + ;'l(t't·'l'llT(I·:,J-
0 P1'pu DT[F ... j)· For any canonical solution t.o FU DT[ /•:,], r!-'1 'll'/'[1·:,) = t'FIIII 'I'<![I·:.rl·
Yo+ Z,.., where Yo= 0 and Zo > 0 [Day87, p. '1(}1)]; mon•ov<·r, such a solutio11 is
guaranteed by Part 5 of Lemma t\'1. SC't.t.iug {J = l nwk<•s corulitiou ( L~) Pqlli\'·
alent to CFlfBT2[Fn) $ CFUDT[F ... ]· Hence, this wdudiou is <Ill L-r<'dlll't.ion. I
The arithmetic equivalence rcdudions from F'BllT[Fd t.o FBll'l'~l/·~1] and
FBUT[F1 , 2::} to FBUT2[F2, 2::) given in Sc•diou a.~. :\ m<• L-n·cluct.ious wit.h n =
/3 = l. However, t.he reduc.t.ion from FUDT[Fd t.o FODT[F] tloi'S no/ H<'<'lll t.o II<•
an L-reduction; though condition (Ll) is sat.isfh·tl (n = 100), condition (L~) (lo<'H
not hold under any constant {3.
Corollary 46 No phylogenetic infr:1'r.nr.c 01Jlima/-(·osl ,.;o/uliou problt:m f'lfl/IJ.i71.('(l
in this thesis (e:ccluding SOL-MIN-Fl/IJ1/F}) Ita .<> a fJ'f'AS 1tulf'.~s fJ = Nl).
Less dramatic but nonetheless intriguing PTAS non-approximal,ilit.y l'l~slllt.s
can be derived using Theorem :J8. A PTAS for an OptP[f(n)]-c:ompl(!t.(!evaluat.ion
problem X implies t.hat there <!xists for cac:!J c, 0 < (: :5 I, aurl all iust.aun! I of X,
I ' I . I . h A I tl t > IAxU!-OI'TxCI>I(~) ' a po ynomm -tunc a gont rn sue 1 ta c _ OI'Txll) Ax(l) ~
1~7
. ,~'·
IAxU>;f/.~;TxUll. By Theorem 38, there exists an c > 0 such that IAxUl;,?,?;Tx(lll ~
~ 2}1\';:/ infiuitdy often for each such A unless P ::: N P. For certain f this lower
llouud is <L posit.ive~valued function, which implies that polynomial-time algo-
rit.!tms for certaiu c, and hence PTAS, do not exist for X.
Theorem 47 If a smoolh/unclion f(n) E O(log n) is such that g(n) = :~;;<:/- and
lim g(u) > 0 for all c > 0, then no OptP[f(n)}-complctc problem has a PTAS n-+r.'(j
unlr·.t.;,o; /1 = N P.
Corollary 48 No Opt Pfr~ log log n + 0( I )j-c'Ymplclc problem has a PTA S unless
/1 = Nl'.
Though tht• rc•levant. levels of the OptP hierarchy in these results are too ]ow to be
of cous('qll!'ltc<' in this tl)('sis, these are the first results which show that specific
porl.ious uf l.lw houuded NP query hierarchy are PTAS non-approximable.
5.4 Relative Approximability
SI'Vl' l'al of !.he phylogenetic inference problems examined in this tl1esis have rela-
t.ive approximat.ious derived from rtpproximation algorithms for re]atcd problems.
STEINEH THEE IN GHAPHS has a relative approximation of 1 -2/ILI, where
/, $ 15'1 is t.lw number of leaves in the optimal tree [I<MB81). The algorithm
guanmtt•t•ing this approximation is given in Table 23. Note that the two crucial
o1wrat.ious in this algorithm (finding the length of, and producing, a shortest
128
·· ~ ,.........,._..........,.., __ ... .. , ••• , .. •• .. . ,.,..,..r~-,-.-.. .,'-' .. '9•-. • .., - ..... -.,_-,_ .. _4-'I~•"""'"~"'•~••NJ!rb'""'..,."..,.._.,..... .. ...,....,.......,..,. ., .. __. ... _..,
path between two gh·cn vcrtict's) can lw donl' in polynomial t.inH' ttsinA !'t.au
dard shortest-path algorithms [CLH91, Section 2(ij 011 a rha.nu·tl'r-hy-dwrart.t•r
basis in an implicit graph, providt•d tlwr<' arP no n•st.dct.ions 011 rharad<'l'·st.at.<•
transitions.
Theorem 49 All n.on-1'cficulatc ~Vag11r1· Dhlf'(ll', W flf/llt'l' (; ,. nrl'al, and Fit l'h
phylogenetic parsimony optimal-cost .~olulion Jlmblrm.-; r.rmuiun/ i11 fhi.~ lhf.-;i,-;
have relative approximations of I - 2/ILI.
The application of this algorithm to phylogt•nctic: parsimony prohh•ms w;ts dis
covered independently by Gusfield [Gus~)J, Tlu.•on~m 2.1]. lnd(•<'d, I.IH' algoril.hru
and result above also apply to thtl prohlmn of const.ruding Ill i 11 i lllal-ll'llp;t.h t.J·<•t•s
on molecular sequences, as long as the fuud.ion comput.iug miuimal t•volut.ion
ary change (edit distance) between pairs of seqltel\r.t•s is a nwt.ric. (<:nsH:J, SPr'l.iou
3J. Unfortunately, this algorithm does not. s<•em applkablt~ t.o otht•r plJylog•~Jwti• :
parsimony problems, a.o; the proof that the ratio ahove holds clt'JH'II<Is on t.lw exiH·
tence of a path between each pair of vertices iu X in f.IH~ i111plic~it. gmplt (1\MUHI,
Theorem 1]. All such paths may uot exist in cladistic problt~rusj Jnon•ov1~r, it.
is not obvious how a character-ordering aucl ol'i<!JJI.af.iou wuld IH~ chost!IJ for a
qualitative problem in polynomial time such that all requirecl pat.las exisl.l!d, ld.
alone how such a character-ordering or orieutatio11 could lm (!rtfom!d irt suhsP·
quent stages of the algorithm. SOL-MAX-CLIQUE has a rdat.iw! approxiruat.iou
of 0{ 10
; 2 n) [BH90] which, hy the L-rcductious in t.he last :wdicm, yidtls ici<HJI.it:al
l~W
Algorithm H:
Input: illl undiredecl weighted graph G = (V, E, d) and a set of vertices 8 ~ V. Output: a Stciucr tree, 7il, for G' and 8.
Steps:
I. Construct the complete undircct.ed weighted graph G1 = ( Vh E~, dJ) from (,' and ,C,',
~. (•'ind t.he minimal spanning tree, Til of G1; if there are several mininud spanning trees, pick an arbitrary one.
:J. Coust.rur.t the subgraph, Gs, of G by replacing each edge in T1 by its rom•spoucliug shortest. path in G; if there are several shortest paths, pick au arbitrary one.
·1. Fiud tlH' minimal spanning tree, Ts, of Gs; if there are several minimal SJ>a.uning t.rees, pick an arbitrary one.
:>. Constt·ucl. a Steiner tree, Tu, from Ts by deleting edges in T8 , if necessary, so t.hat. all l<'aV'~s in 1il arc in S'.
'nthiP ~:J: A polynomial-time relative approximation algorithm for STEINER THEE IN GHAPIIS (adapted from [KMB81])
130
I .!· f !
~ ..... _ .,._,, __ ...._....,,,_ ... __ , ___ ......... - _,_ _ ___.,,,,>N~---... ........ .............. ..., ... .. - ............... ~.,_.,.,.,....~~
approximations for all character compat.ihilit.y problems.
Theorem 50 All chamctc r compatibility opt imal~t·o.-;f ,.,ofut im1 proMrm.~ r.rtllll~
incd in this thesis have rclalivr approrimalious of O(;-:-:r.1 " ). n~ II
No relative approximations arc known for any of t.lu' dis\.atH'C' matrix lit.t.in~ proh-
!ems examined in this thesis, though t.lwre are rdatiw appmxima.t.ions for rt'lal.t•tl
clustering problems; sec [Day92] for a wvit•w of th<'S<' n•stdt.s.
Theorem 43 actually st.at.es that. M AX-:JSAT has 110 n•lat.i V<' approxi lltaf.ioll
( for some l > 0 [ALMSS92, Footnote•, p. i] ; thus, hy Corollary :u;, t.ht• L-
reductions in the previous section imply hounds on t·t•latiw approxima.hilil.y ns
well. Unfortunately, values of (derived to dat.P using 1.\11' rotts1.nwtioll in Tlll'oi'PIII
4:3 imply only trivial lower bounds [.Joh92, pp. rll!Hi'lO]. Ot.h(•r l'SI.illHlf.t•s for
these bounds may be derived from t.lw hcsl. kuown t'<•lat.ivf' approxinmt.iotls oil
SOL-MIN-VERTEX COVER-Band SOL-MAX-X:JC-B:
• SOL-MIN-VERTEX COVER-B has re!Rt.ive Rppmximett.ioll (: = {O.:.ll'i , O.'l!i,
0.50, 0.56, 0.60, 0.64, 0.67, 0.69 I, 0. 71 } for :J $ H :5 II, aud r: $ 2( u1:~~- 1 ) -
1 for B > II [MS8:J].
• SOL~MAX~X3C-B has relative approximaticm r; = ( IJ- I), /J 2 :J IJ'B!)(),
Theorem :J].
The only nontrivial lower bound on relative approximalJility is for l.lw dtaradt•J'
compatibility problems, and is based on a result from [FGLSSUI] as irnpruw••l by
Arora ct al. [ALMSS92]:
l:Jl
Theorem 51 ([ALMSS92], Theorem 5) There cxisl.r; an E > 0 stLch lhat, if
.wn-MAX-CDIQIJB ha.c; a relative approximation of n.t, Own P = NP.
Corollary 52 Tlu:rr: exists an { > 0 ,r;uch that, ~f any character compatibility
Jn·oblf:m rxtuninr:d in thi.'l 1/u:sis has a rclati1Jc appr·oximation of n(, then P = NP.
As a fiual uote, relative approximations arc known for certain counting prob
l<•rJJH. By Theorem :1.1 of [Sto8.5], all # P funct.ions f( x) have approximations
J,,1,1,(;r) i Jl Fll[; sudt that ( l - ( )]app( x) < f( x) < (I + f )lapp( x) for a11 polynomi
als 71, wh(~re t = I /TI(I;rl); Theorem 7.1 of [KST89) extends this result to SpanP
prolll( ~rns. B.1•call that. an NP query can he simulated by an appropriate}:;~ query.
Theoren1 53 AlltJhylogcnclic infet'cncc optimal-cost, given-cost, and given limit
:·qmr111ing problrm.<~ r.xaminrd in this thesis have r·elative approximations of t: m
I-'Ll~, wlurr. t = lfT!(I/1) fo,. auy polynomial p.
5.5 Approximability by Neural Networks
Thc•t·c• has ht•c•u much inten.•st in recent years in computing approximate solu
t.ious l.o optimi~alion problems using instance-specific neural networks [HT85a,
ll'l'Sr>h). In t.lu.• disct·l't.l'-time vrrsiou of this model treated by Bruck and Good
man [BGno], a nema.l network is d escribed by a set of two-state nodes V, a set
of arrs with W(•ights W;,j that specify the input from node i to node j, and a
~t.al.t•- rhaugt• t.hreshold value 1i for each node. Let the state of node i at time t
132
• _..,......_,& .. _,.__r,. ____ .,..,..,.~~----... -. ......... __ ._ .... ~~-~11 ............... 1 *"'''"~ .....
be Vi(t), and let the stair. of the network at. timet lw t.he Vl'rt.m· F(l). At. t'<Kh
time t after initialization, the stateR of l'arh Vl'l'lt•X \.'i in HOtnl' s11hst•t 8 ~ F i\l't'
updated by
V;(t+ I)= l WI if 2: Wii\lj(l) ?. 'Jj
J=l (B)
-1 oth('rwhw
A state V(t) is stable if V(t) = V(t + 1). Such n<'l.works m·c• always gum·;ml.t•t•tl
to get to a stable state (BG90, pp. I :J0-1 :J I] which ror·r·t•spouds to sonw sulul.iou
to the associated problem. Consider tht• following l't•st.rirtt•d class of sudt lll'lll'fll
networl<s that have symmetric weights and satisfy th<• followiug JH'opt•r·t.it•s IB< :uo,
p. 132).
• Each stable state corresponds to an optimal solut.iou of t.lu~ <~nrtHI<·d insl.atlf'<'
I of the associated problem X, aucl that. solution t:an I)(• tft•r·ivc•tl fru111 this
state in polynomial time.
• The network's description is of size polynomial iu 111.
A problem X is said to be solvable hy a ncural11dwork if t.ht~l'<~ t~xists ;m alp;ur·ith111
Ax which can, for any instance of X, gcrwrate the r.om~spontliug twnral 11dwork
in polynomial time. Note that s11d1 a ru~twork may pof.(!fll.ially t.akt! c•xpoii('IJI,ia.l
time to reach a stable state.
Theorem 54 ((BG90], Proposition 1) If an NP-Iwnl pmbif:m i.'l .c;ollltLIJif· /,y
a neural network then NP == co-NP.
Corollary 55 No TJhylogcnctic inference optimal-cost, given-cost, or given-limit
JH'tJblr;m r::u.1.minul in this thesis can be solved by a neural network ttnless NP =
f'o-NP.
Alt.<>ruat.ively, l!ar.h stable state can correspond to an approximate, rathet' than
au optimal, solution . Many traditional proofs of approximability [HS78, GJ79]
l'illt IH~ trivially modified to show that certain approximations by neural netw01·ks
fot' NP-hard prohlems arc not. possible unless NP = co-NP [BG90, Yao92]. Indeed,
auy pi'Oof in whieh an optimal solution can he derived in polynomial time using a
giv<m t.ypc• of approximate solution can be so modified. By analogy with PTAS,
dl'line a Polynomial-Time Neural Approximation Scheme (PTNAS) for a problem
X as an algorithm A which, given an instance I of X and an integer k, k > 0,
produ~('S in polynomial time a ncmal network that produces solutions whose
('m;t. iH wit.hiu a fnclor of k of optimal. The following results stated above can be
n·phnt.•w<l in f.(~rms of approximability by neural networks:
Corollary 56 Thr following hold:
I. If thr1'(' is a PTNAS for 80L-MAX-3SAT then NP = co-NP.
:!. TIH' I'f docs 110! c.r.isl a PTNAS for auy "'/AX SNP-hard problem ttnh.ss NP
= ro-NP .
. ·1. Th rt·r r .risl.~ UTI f. > 0 sud~ that, if SOL-MAX-CLIQUE has a relative ap-
pro.rima.liou of 11r by 11 neural network, then NP = co·NP.
134
•••......--•-••,, ..... _. ____ ,_ ,. ~ .. ".,....,...,.,-...-'llilo-IM--PIIIl-111·-·l'dl-l ___ llt .... D44UI_ ..... _"""_'_lM,_~·-·-~~~~----...............
Proof: Proofs of (1) and (3) (sketch): Construct. t.ht• t'Xact.-solttl.iott JU'III'al
networks for every problem in NP from the Hssllml'd PTN AS for SOL-l'v1AX-
3SAT and SOL-MAX-CLIQUE hy using the gt'tlt'ric n•dud.ions from all \au
guages in NP to MAX-3SAT and MAX-CLIQUI~ giwn in tlw urip;inal proofs
in [ALMSS92, FGLSS91J as neural netwm·k dt•srl'ipt.ion-t'IICOding atul solution
decoding functions. As NP-complctc problems are by clt•linitiou NP-Iw.rd, t.hc•
results hold by Theorem 54.
Note that unlike the proofs given in [BC:90, Yao!J~], tlwst~ proofH clo nol invulv<'
dcriviug optimal-cost solution ncut·a] networks for SOL-MAX-aSAT and SOL
MAX-CLIQUE from their respective PTNAS.
Proof of (2}: Note that L-rcdudions preserve PTN AS-approximahilit.y as Wt•ll
as PTAS-approximabi1ity. I
Corollary 57 No phylogcneUr. iujcrcncr optimal-t·ost .'inlulion prohlfm f:r.mnitu·d
in this thesis has a PTNAS unlc.o;s NP = co-NP.
A construction similar to that in Corollary !)6 can also bt! used t.o show t.hat. uo
MAX SNP~hard problem has a randomized PTAS (RPTAS) (BSn, J{LX:J], i.e. IL
PTAS which for each t:, 0 < f. < 1, guarautccs a soluticm with t.lw requiwd msl.
with at least probability (I-t:), uuless R. = NP [.Joh!J~, p. 51!J).
The results in this section apply ouly to the restricted r:lass of ltf!ltralrldworks
considered in [BG90). Less constrained types of ueural rtel.\mrks may ~~xist for
13!)
tlwse pmblems; for r)xample, .Jagota [.JagfJ2] has designed asymmetric-weighted
ru•tJral udworks for MAX-CLIQUE that perform extremely well on average.
5.6 Summary
Tlw kuowu t.lwordic:al and algorithmic. lower limits on approximability for the
pltylogt'llf'f.ir: infr•re11ce problems examined in this thesis are given in Table 24.
'l'hough t.he logie-formulat.iou of approximability has produced the most dramatic
r<•:mlt.s, t.hc! various theorems derived using the work of [G.J79, Kre88J should
uot IH' dismissed, as these theorems establish a teutativc connection between
vario11s t.yp<•s of approximahility and the levels of the function bounded NP query
hir•rarchy. Though the correspondence is not. exact ([Kre88, p. 492]; [CP91, p.
~·I:Jj), t.lwr<' is a. pat.t.cm of approximahility and non-approximability (see Table
~!>). This pat.l.l'rll may assume gr<.•atf'r significance in the light of future discoverit>s
of lo\Vl'l' limits on approximability.
ThP n•sult.s ahov{' imply that polynomial-time algorithms whose approxima
t.iou bouuds hold OV{'f all inst.arH'l'S do not exist for any phylogenetic inference
optimal-cost. solution problem for any of the closest types of bounds (i.e. ab
solut<' approximation, FPTAS, PTAS). These results do not invalidate either
<'Xist.ing phylogt•twtir inf<>n·nce approximation algorithms or phylogenies pro
dun•d hy t.hl'S(' algorithms - oth{'r kinds of fast approximation may be possible
(t'.~. asympt.ot.i<- approximations. whose bounds hold for all but finitely many
136
· -· • • - - ~ ... . . ... ..... -~ · - •• ••• - • • • .. ... .. 'y • • • ~,, .. ' ~ . ..... _. _ _ . , , ,, . ~ .. ..... .. . . . . .................... ....... . ,-........... ~ ... _, ""' ___ .. _ .. _"_ ... . ... ~ . ......... ... ~- - ·--'~
Approxima hilit.y Tht•orC'tical Alp;mit.hmic Lom•r Limit. LoWt'l' Li Ill it.
Phylogenetic WL, WG, Fi 1( I - l~l)- I Parsimony t n•l. app., l't'l. app.
CS, Do, Cl, G<' { > 0 .
Character 11' t't•l. app., 0( Jl ) f.,;r,; Compatibility ( > 0 n•l. app. Distance Matrix FUDT[F] 110 0(7wly) ahs. app., Fitting uo FPTAS .
All Others ( rd. app., ( > 0
Table 24: Approximability of phylogcnetir iuferetln' optimal-cost so)ul.iuu fuuctions.
Ahsolut.<~ A pproximat.ions o(log n) o(poly) O(poly) FPTAS
F pNP(cloglogn+O(J)J X - - X F pNP(O(Iogn)) X X ., X FP,NP
F~Nr X X X '{
X X X J
X - Whole da:1s is IIOII·approximahln. J - Members of class are approximahh~.
- Approximabilit.y uot. relevant.
P'I'AS X t J J J
Table 25: Non-approximability of various lr!vds of I.}J(l Fundiort Bottllrlt!d NP Query Hierarchy.
t Applies only to complete problems.
iust.<uJc:c!sj snlH~xponent.ial-timc approximatiou), and phylogenies derived from in
~.t.;u1c:c~s «!IIW11n1.1~t·c~d in practic:e may he among those that arc close to optimal.
llowc~ver, in <LilY application such as plr logcnctic inference in which the degree of
optimality of approxi mat.e solutions is important (see Section l), no approxima
tioll algorithm or solution produced by such an algorithm should be trusted until
aualysis lms shown exactly how good an approximation that algorithm gives.
6 Conclusion
In this thesis, I have establislwd a fnmwwork t.hat. itu·orpor·att•s all phylugt'tll'l.ic in-
ference decision problems studied tu dat.t•. Wit.hin this fra.lllt'\\'ork, I ha\'t' dt•rivt•tl
various bounds on the evaluation, solution, spanuiug, t'llllnwrat.ion, and muclo111-
generation versions of the optimal-cost, given-cost., and giV<'n-limit. phylog<'llt'l.k
inference problems. I have also derivC'd low<•r hun tuls on t.lw apprm:i mahi I ity ol'
phylogenetic inference solution problnms. Tlwst• rPstdt.s a.n• sllllliiHiri~t·d iu 'l't~hlt·
20 and 24. These results show yet again t.ha.t. t!Pcision pi'Ohlt•ms <'OIIn•al lllilii,Y
facets of the complexity of their underlying opt.imiimt.ion prubiPms. Tlu• romplt•x-
ity of more complex versions of optimizat.iou prohl('ms should hP inwsl.igal.<'d nul.
only to better assess the true dirficult.y of t.he utult~rlying prohlt•ms, hut. itlso lw-
cause such complexities may h<wc ramifications for how dosdy t.lt<•st• prohl••ms
can be approximated by fast algorithms.
Future directions for research arc:
• Determining the precise com plcxity of phylogenetic: i uferc•IJ('t! evaluat.iou ancl
optimal-cost solution pwhlems. If tlws<' pwhlems art! provably <'il.sic!r l.lmu
F pN P, more classes will need to he clescrihef! hdWI'«!II F fJN /'(()(lu~~,u)) aud
F pNP. Such a set of classes might bdoug t.o t.ht! fuuc:t.iou a.ual()gtw uf t.ht•
hierarchy developed in (CS9~].
• Determining the precise c:ornplexit.y of plrylogmwt.ic infc!f(!rtt:c! spamrirrg <UIII
J:m
cmumemt.icm problems. Tltis may be possible using classes from the hicrar
dties of fuucticms defitw<l in [KrdJ2a, LadR9, WagK86a, Wagl\86b].
• Firuliug algorithms with guaranteed relative approximations for the dis
tance~ matrix fitting problems aud the remainder of the phylogenetic par
sinumy problems. The latter may he possible by recently-developed algo
rithiiiS t.hat improve ott t.hat given in [KMHSl]; sec [BH!Jl] and references.
• DNiving approximahility results for phylogenetic inference given-limit and
given-fosf. prohlems, based not on algorithms that guarantee solutions of
a pa.rtifulm· r:ost hut. a.lgol'ithms that arc either polynomial-time or· correct
on all hut. sonw polynomially-boundcd subset. of their instanc<:>s. Such a
ft·amc•work is dt•scribed in [SchU86, Section 3] and [BDG90, Section 6].
This fmnwwork is also applicable to optimal-cost solution functions.
Ht•sults from t.lw growing litt•ratur(' on computational learning theory [Kea90,
LV!JOa, M IJ.IHB] and the> computational complexity of local search heuristics
[.J FYHH, YanHO] may also he applicable to the further analysis of phylogenetic
in fc•n•tu't' prohh•ms.
140
- -- • • ·:~'"' . ,. . . ........ :r .. _ ,.._.u .... ~-· ... - -- .... - - ""'·"' ............... ._ .. ·-~-.. ......... .-. .-.---......... _ .. _ ___.. -~-·
Referenees
[ABG91)
[ALMSS92J
[ADS86]
[ANI90]
[Ax87]
(BDG88]
Amir, A., Bcigt>L H., and GasHr<'h, W. I. .'iomc· C'onrtr·cliou ...
brtwcrn Bormdrd Qurt•y Clas.•;c·s awl Non-lluiform ('olll/daif!J.
Manuscript, H)91.
Arora, S., Lund, C., Mot.wani , H .• S11dan, l\L, and Szc'p;l'tly. ~1.
Proof Vaification and lnlrarlnbilily of Appro.rimaliou /'I'Ob/nn ....
Manuscript, 1992.
Ausiello, G., D'Atri, A., and Sacca, D. Miuirual H(•pn•sc•ut.al.iou
of Directed llypcrgraphs. SIAM .Jormwl ou ( .'ompulillff, lf1 ( ~),
419-4:31, 1986.
Ausicllo, G., Nanni, U., amllt.aliauo, G. F. Dynamic Mainl.t'llfllll'l'
of Directed llypergraphs. 'l'hrordim.l (.'ompuln Hr~ ic · t~cT, 7~, !17
117, 1990.
Ax, P. The Phylogcndif' Syslrm: 1'/u· Hy.'llr·mali:rzlion of (h:r;rl/1-
i.qms on the Basis of lhf'ir l'hylotJt:nir·,q, Tr·; •. uslat.«'rl by H. P. S .
. Jeffries . . Jol111 Wiley, New York, )!}87.
Balcazar, .J., Dfaz, .J. , arul Gaham), .J. Slnu·luml (,'omplr"J'il y
I. EATCS Mouograpl1s CJII Tlwordic:al Com put.r•r Sdl•ut:l! uo. I I.
Springer Verlag, Bcrliu, I !JH8.
141
[BDC!JO]
[Bd!J IJ
[BSH:l)
[BP~!J]
Bald.zar, .J., Dfaz, .J., and Gabarro, .J. Structural Complexity
1/. EATCS Mouographs on Theoretical Computer Science no. 22.
Spriuger Verlag, Berlin, I ~J88.
Bl•eri, C., Fagin, R.., Maier, D., and Yannakakis, M. On the Desir
ability of Ac:ydir. Database Schemes . .Journal of the Association
for Computing Marhinrry, :lO(:J), 479-.513, 198:3.
B(~igd, H .. NP-hard Sf'ls arc p-supcrl.crsc unless R=NP. Technical
Hl'port. 4, The .Jolms Hopkins University, Department of Com
put.(•r Science, 1988.
Beig(•l, B.. Bounded Queries to SAT and the Boolean hierarchy.
'/'hrorrliml Computer Science, 84, 199-22:3, 1991.
BNm<m, P. and Schnitger, G. On the Complexity cf Approximat
ing the Independent Set Problem. lnfonnation and Computation,
BH, 7i- 9•l, 1992.
Bl•rn, M. and Plassmann, P. The Steiner Problem With Edge
L(•ngths I and 2. Information Processing Lcll.ers, 32, 171-t76,
tmm.
B(•rgc.'. C. Graphs and l/ypc1:qraphs. Translated by E. Minieka.
Nort h-llolland. Arnstc.•rdam. 1973.
142
[Ber85]
[BR91]
[BSS89]
[BH90]
[BG90]
[B.JY89]
. -~·- · .. --~·· · ·---·--- -~ ---~ --~-.. ---.. ------·----.. -·-·--·'111-J> ...... _
Berge, C. Graph.~. St•cotal rt•\'ist'd t•dit.iou. Nurt.h-lloli<IIHl. Ams
tcrdatn1 1985.
Berman, P. and Ramaiyc•r, V. lm]II"OI'fll Appro.rinwlion.o; for lhr
Steiner 1'rrc Problem. ~'lanuscript., 19!)1.
Blum, L, Shub, M., ami SmaiP, S. On a Tht•ory of ( ~omput.nt.iou
and Complexity over t.lw H<'al Numlwrs: N/'-romplt•t.Ptlt'ss, Ht't'lll'
sive Functions, and Uuiwnml Machint•s. Hulldiu (If lhr t\rrwr-ir·r111
Malhcmal.ical Soddy (Nnt' .'Jf'rif's), ~I (I), I ·Hi, I !II·!H.
Boppana, R.. aud llall<l<;rssou, M. M. Appwximat.itl~ Maxit1111111
lndependenl Sct.s by Exr.lu<ling Subp;mph:-;. Ill .J. H. c:illlt'rt. (IIIII
R. Karlsson (cds.) SWAT .90. Ll'ct.un• Not.•:-; i11 Cotnpul.c>r Sdt•un•
no. 44 7, Springer-Verlag, Berliu, I mm. I :l -:!!),
Bruck, .J., a11<l Goodman, ,J. W. Ou the~ PowPr of Nr~ural Nc•t.works
for Solving Hard Pro"lems . .lon1"7utl of (.'omp/,.Ji/y, fi, I:W I:~:,,
19HO.
Bruschi, D., .Jos<~ph, U., aurl Youug, 1'. J\ St.rw:t.ural 0Vf•rvii~W
of NP Optimization Prohlc111S. lu II. Djidjev (1~1!.) Oplirtuzl AI-
gotilhms: JJmcr.cdin!J·'~ of 1/u: lnlr:t•tuiiimwl Sympo.o;ium, L1~f'f.un•
Notes in Com puler Scieuw rw. ~0 I, Spriu.e;c~r- V<~rla.e;, u~~rliu, I !J~!I.
205-231.
[CS!J:l)
[CE(i7)
I< ~TH 1]
[CLH!HJ
Camin, .J. II., anrl Sokal, R. R. A Method for Deducing Branching
Sequences in Phylogeny. Bvolution, 19, :311-:126, 1965.
Castro, .J., and Seara, C. Characterizations of Some Complex
ity Classes hct.WCC'Il e~ and 6.~. In A. Finkel and M . .Jantzen
( <·ds.) STA CS '.92: .9th Annual Symposium on Theoretical Aspects
of Compul.rr Scic1u·c, Lecture Notes in Computer Science no. 577,
Spring('r-Verlag, Berlin, 1992. 305-318.
Cavendt>r, .J. A. and Felseustcin, .J. S. Irrral'iants of Phylogenies
in a Simple• Case with Discrete States . .Journal of Classification,
·1, r>7-71, I !>87.
Cavalli-Sforza, L. L. and Edwards, A. W. F. Phylogenetic Anal
ysis: Modds and Estimation Procedures. American .Journal of
llllman Gr11dic.r;, 19, 233-257, 1967; see also Evolution, 21, 5.50-
!iiO, 1!)()7.
CIU'n, Z-.Z. and Toda, S. On the Complexity of Computing Opti
malSoluliou . .;, Manuscript, 19!H.
Cornwn, T. H., Ll'iserson, C. E., and Rivest, R. L. Introduction
to :\lgor·iflnrr . .;. MIT Press, Catnbrhlgc, MA, 1991.
l.t.l
[CFS91}
(CP91]
[Day83]
[Day87]
(Day88)
[Day92]
[D.JS86]
-----·---. ...-............... "' ................. _. ------........ -------
Crescenzi, P., Fiorini, C., and Silwst.ri, H. A Noh• ot1 t.h(' Approx
imation of the l'viAX CLlQUE Prohlt•m. Information l'l'or·r.~.~ill!l
Lcllcrs, 40( I), 1-5, ln91.
Crescenzi, P, and Panmnl'si, A. Cumplt•l.t>Ht'~s in Approxima1.inu
Classes. Informal ion flllfl CIH!/1'(1{, 9:\, 2·11 --:W2, lmH.
Day, W. II. E. Comput.at.ioually Difficult. Parsimony ProhJ,•ms
in Phylogenetic Systematics . .lournlll of 1'/tr·ordi('(ll /liolo!/!J, IO:l,
4 29-4 :38' 198:3.
Day, W. II. E. Computat.hmal Cornplt•xit.y of lnf<'l'ring l'hylog<'llil's
from Dissimilarity Matrict•s. I.Jullrliu of Alalhnmzliml HiolomJ,
49(4), 461-467, 1!}87.
Day, W. H. E. Cla.t.ts Notes, Compuft:r- Scir llt't' fi7,'jH: Sw·t·inl Top
ics in Compttfc1· A ]J]Jliralimt.r;, I B8H.
Day, W. II . E. Complexity TIH'ory: Au lut.rodudiou l'or l'ml'
titiouers of Cla.<;sificatiou. l11 P. J\rahi<~, <L dt• Sol'f.l', a!lfl L. .J.
Hubert (cds.) C/w;tr.ring mul Cla.o;,r;ifimlirm, Worlrl Sdmtl.ilk l'uh
lishing, Teaneck, N.J, lHU2.
Day, W. H. E., .Johnson, [), S., arul Sarrkoff, D. The~ Corupula
tional Complexity of lrrferring H.ooted Pl1ylogmtiPs l,y l'arsimouy.
Mathematical I:Jioscicncr~s, 81, :J:J -42, I mw.
I 4.1
[DS~W]
[I>SH7)
[I >THO]
[DukH!l)
(ECHO]
[E.Jf\liH]
Day, W. II. E. aud Sankoff , D. Computational Complexity of
luferring Phylogenies hy Compatibility. Syslcmatic Zoology, :15(2),
22~-22!), I !J86.
Day, W. II. E. and Sankoff, D. Computational Complexity of
lnf(~rriug Phylogenies from Chromnsomc Inversion Data. Journal
of 'l'hwrctiral Biology, 124, 21:1-218, 1987.
J>lai:, .J. and Tonin, .J. Classes of Bounded Nondeterminism. Malh
('1/Wiiml System." Theory, 23(1 ), 21-32, 1990.
Dorado, 0., Ricschcrg, L. H., and Arias, D. M. Chloroplast DNA
lnl.rogn•ssion in Southern California Sunflowers. Evolution, 46(2),
!)(i{i--!) 72, 1992.
Duke, R. Types of Cycles in Hypergraphs. Annals of Disct·ct.e
Alalhf:malirs, 27, :J99-418, 198.5.
Eldredgt•, N. and Cracraft., ,). Phylogenetic Patterns and the Evo
lulionary Prorr.c;s: Mrlhod alld Thrm·y in Compamtivc Biology.
Columbia University Press, New York, 1980.
Estahmok, G. F., .Johnsop .. C. S., and McMorris, F. R. An Alge
hrair Analysis of Discrete Characters. DisC1'ele Mathematics, 16,
1·11 - Hi, Wi6.
1·16
[EMii] Estabrook, G. F. and ~lcl\lorris, F. H. \Vhl'll an• Two Qualit.a-
tive Taxonomic Chari\clt•rs Cumpat.ihlt•'? .10111'11!1/ of Malhrmalifal
Biology, 4, 195-200, IBii.
[EM80] Estabrook, G. F. and McMorris, F. H. Wlwn is Out• Est.imal.t· of
Evolutionary Hclationships a lh•fini'IIH'III. of Auol.ht•r'! .lounwl of
Malhcmalical /Jiolo!JY, I 0, :W7--:Ji·l, I !ll'\0.
Fagin, R. C:cncraliz<'d Firsl.-ordt•r Spt•t·Lra and Polynomial Tinw
Recognizable Sets. In H. M. 1\aq> (t•tl.) Complf'J'ily of em,rm-
lations, SIAM-AMS ProcC't-dings no. 7. Anll'riran Ma.tlu'IIHtl.ical
Society, Providence, HI, 197'1. tt:J- 7:J.
[Fag83] Fagin, R. Degrees of Ac.ydidt.y for llypt·rgra.phs IUtcl Ht•llttional
Database Schemes. Jo1lrnal of t/u: As ... of'ialion fol' C'ompttl.iii!J Met-
chin cry, 30(3), 511- !)!)0, I !J8:J.
[Far72] Farris, .J. S. Estimating PhyloppiiC'f.k Trl't's from Disf.;tlll'l' Mat.ri-
ces. American Nafumli.'ll, lOH, ()~!">-fiHH, )!)72.
[Far77] Farris, .J. S. Phylogenetic Aualysis undt!r Dollo's Law. S!J.~lnnulif·
Zoology, 26, 77- 88, I H77.
[Far78) Farris, .J. S. lnfcrriug Phylogerwt.k Trt'PS from Chroruoso1111~ Ill·
version Data. Syslr:maiic ZoolofJY, '27, '27!'i -'2X~, I !J7X.
117
[FarH:l]
[ F{: I ,SSH IJ
[F'IIOS!)2)
Farris, .J. S. The Logical Basis of Phylogenetic Analysis. In N. I.
Platuick and V. A. Funk (eds.) Advances in Cladistics, Volume
2: Prorrf'tliu!Js of llw 8cro11d Meeting of the Willi Hennig Society,
Columbia University Press, New York, 1983. 7-36.
Ft·ig<', U., Golclwasser, S., Lovasz, L., Safra, M., and Szegedy, M.
Approximating Clique is Almost NP-Complete. In Proceedings of
lhc Tl!irty-Sf:cond Annual IEEE Symposium on the Foundations
of ComJittlr:;· Sricn('(:, IEEE Computer Society Press, Washington,
D. C., WHI. 2-14.
Ji(.Jseustein, .). S. Evolutionary trees from DNA sequences: A
maxinmm likelihood appmach. Joumal of Molecula1' Evolution,
1;, :\68-:nn, 1 ~>81 .
(i(•ls<•nstein, .J. S. llow Can We Infer Geography and History from
Gene Frequencies'? Journal of Theor·etical Biology, 96, 9-20, 1982.
Fdsenst<•in, .J. S. Phylogenies from Molecul;.r Sequences: Inference
and Hrlinbility. Aunual Review of Genetics, 22, 521-565, 1988.
Ji'Pnlll'r, S., Homer, S., Ogiwara, M., and Selman, A. L. On Using
Orarlr!i That Compule Fuuclion.'l. Manuscript, 1992. To appear
in S1itCS '93: lOth Annual Symposium nn Theoretical Asprcts of
Compt, ! . ,. Scir.rlcr..
1·18
[Fit71)
[Fun85]
[FB90)
(GW8:3}
[GGJ77)
(G.J79]
(Gas86)
(Gas92]
• -· ····------.----.-·-,-~,..,-,..._. ,.~·~.,•·1 ...... _._,_.,...._._..,_ • • ,.,, ,_,, .. .,. .. -J\otal-""""'f~~---"'''*"'.,.,...~\fWf"l~
Fitch, W. M. Toward Ddining tlw Courst' uf Emlnt.ion: l\linimal
Change for a Specific Tree Topology. S,rJ.-;/rmnlic Zoolo!f!/, :w, ;l()(i -
41 c, 1971.
Funk, V. A. Phylogenetic: Pa.tt.ems aml llyhridhmt.ion. A una/ ... of
f.he Missouri Botanical Gardr.n ... , 72, G~ 1--7 I fl, nl~r).
Funk, V. A. and Brooks, D. H. Phylo.tJt'llrlit· Sy ... lf'lnalit·,.. a ... lhf /Ja
sis of Comparalitlc Biology. Smit.hsouiau lusLit.ut.ion Pt·t•ss, W<~sh
ington, D. C., 1990.
Galpcrin, H. and Wigdcrson, /\.. Snrrind. H('PI'<'S('III.a.t.ions ur
Graphs. luformal.ion and Con/.rol, !W, IH:l - l!lH, 1!1!-l:L
Garey, M. R., Graham, R. L., and .Johnson, D. S. Tlu~ Corupl(!:<it.y
of Computing Steiner Minimal Trecli. SIAM .Jmmutl tm t1pplnl
Mathematics, 32(4), 8:15-8!)9, 1H77.
Garey, M. R. and .Johnson, D. S. Comrmh!1's ami htlmdaf,ilily.
W. H. Freeman, New York, 1979.
Gasarch, W. I. The Complexity of Optimization F'lt7u~lionH. 'J',!dl
nical Report no. 1652, University of Marylarrd, Depart.uwrrt. of
Computer Science, 1986.
Gasarch, W. I. Personal cornrnuui«:al.iou, .July II, I!J!t~.
149
-·~.
{I h•i!lO]
(:mw.rch, W. 1., l<retJI.el, M. W., aud Rappoport, K. .J. OptP
as lhf' Normal !Jclwnior of NP-Complclc P1·oblcms. Manuscript,
I !J91. To appear in Mnlhcmaliral .'lJJ.-;Icms Theory.
(;raltarn, It L. and Foulds, L. R. Unlikelihood That Minimal
PhylogPnic~s for a BPalistic Biological Study Can Be Constructed
i 11 Heasonable Computational Time. Mathematical Bioscicnces,
Gntnt, V. Piau/ Speciation. Second edition. Columbia University
Pn·ss, New York, I!JSJ.
Cusfield, D. The Steiner Tree Problem, 1/istoricnl Rcconslruc
liou, and /1hylo,qruy. Manuscript, 1991. This is a revised version
of Tt'rhnical Hc>port :t32, Department of Computer Science, Yale
Uuiwrsity, l!l84, hy t.hr same author.
C:usliPld, D. Efficient tvl(•thods for Multiple Sequence Alignment
with C:uarault•(•d Error Bounds. Bullrtin of Mathematical Biology,
.1.1( I), 1·1 I-1!).1, I 9H:J.
I h•i11, .J. R(•conslruct.ing Evolution of Sc(tUcnces Subject to Recom
hiuat.ion Ue;ing Parsimony. Mathematical Bio.<;cicnccs, 98, 185-
:wo. I!HlO.
1.10
(HHSY9l]
[HP84)
[Hen66]
[HT85a]
[HT85h]
[HS78]
[IIum8:J]
' • ' - - • '" ~ · . , ~ .. _ ,..,....,_ , ,,,, .,..., .... ...,,_ ... ~.,....,.. ..... ,..., , . ... ..... .. ~ . .. . ' ' · "" • ~ ._ ,,.,..,. .. ,. .... _., •• • . v .. ,.. •• ,,.. _ _,_ .... \ ,, ~. ,...,..-~ ... , •• - .. ... . . , , .,...,,,.. .... ,. ~ ,. • • , .,. • .., ,.,-.,..--.,...._..,.
Hemachandra, L., Hm•tw, A., Sic•rkc•s, n., anti Yuunp;. 1'. On S<-ts
Polynomia.lly E11utnc•rahlc• hy lkrat.iun. Tlrn•ITfic,r/ ('o"'l'"fn· Sr·i
rncc, 80, 20:J-22f>, ImJI .
Hendy, l\1. D. and Pc•nii,V. D. Cladogrm11s Should Bt• Callt•tl 'l'rt•t•:o:.
Syslrmatic Zoology, a:J(2). 2·1!l -:!·17. nlS·I.
Hennig, \V. Phylo!Jfllrfil' Sy.o.;lfmalif' .... Tnu~slat.t•tl hy 1>. 1>. lla\'is
and H. Zanger!. truiv<•rsity of Illinois Pn·ss, llrt.ana, JL, l!lfili.
llopficld, .J. .J., and Tank, D. W. :·.:,,Jral <'ompul.a l.iuus of l>t•ci
sions in Optimization ProhiPms. Hiolo.f!il'(/1 ( 'ylu·rndit'.", !"1:!, HI
1.12, 1mm.
llopfi<•ld, .J • • J., mul Tmtk, 1>. W. Cornput.in~ wit.l1 Nt·und Cin·uit.s:
A Modd. Srirllf'f·, 2:1:~, (i:g, n:J:I, I B~.C).
Horowitz, E. an<l Salllli, S. Frmtltlmr·llfa/ ... of (.'owpul,.,. Al!Jorilhm ....
CompulN Sci<•nr<• Pr<•ss, Hockill<•, MA , I!JiH.
Humphries, C .. J. Primar·y IJH!.a in Hybrid Awdysis. lu N. I.
Platnick and V. A. Funk (I-ds.) Adl!twr·r·s iu C'lfu/i,o;lit·.o.;, Volumr
2: PT'Ocr:cdinfJS of llu. St'('(J/Id A-lu·liu,tJ of lhr· Willi llu111i!J Soddy,
Columbia Uuiversity Pn·ss, N(•w Yul'l< 1 J!JH:J. X!J J(J:J.
I !j l
[.JS71]
[.JVVHfi]
[.Joh7·1]
[.J 0 It! 10 l
[.Joh!l~]
.Ja~ot.a, A. E'j]ir:int.tly Ap1n·oximaling AJAX-CLIQUE in a
1/opjir:lrl-!;/y/r· Network. To app«>ar in htlrTnational .Joint Co11frr
,.,l.l'f: on Nf·"J.ml Nf'lworh, IEEE Computer Society Press, Wash
i u~l.ou, D .. C, I !HJ2.
.lal'flim~, N. aud Sibson, It Malltrmnlical Ta:ro11omy . . John Wiley,
Lonclou, I !171.
.IPrrtlllt, M. H., Valiant, L. G., and Vazirani, V. V. Random Gcn
c•r·al.ion of Comhinat.orial Structur<'s from a Unifonn Distribution.
Th('(Jrdiml Computer Sricnl'f, 4:J, 169-188, 1986.
.Johnson, D. S. Approximation Algorithms for Combinatorial
l'rohlt•ms. Jonrnal of Cmnpulrr aud Syslrm .'Jric11crs, 9, 256-278,
1117·1.
.Jollltson, D. S. A Catalog of Complexity Classes. lu .J. van
Lc•pm\'('11 (t'tl.) 1/audbook of Thcorclical Compulrr .'Jcicucc, Vol
tt/1/t · :\: Alf/OI'ilhms and Complr.rity, MIT Pn•ss, Cambridge, MA,
IH!IO. :q - I (il.
.Jollllson, D. S. TIH' NP-complett•ness Column: An Ongoiug Guide
(:!:Jnl Edit.iou) . .Journal of Al,fJorilhms, I:J, 502-5:H, Im)2.
1;'12
•• • • - ·· .,. .. ~ ... .- · -·~·.._ ,"-,. • ,~,.., ,.,. • . , _ _ ~ · ••-••• · -·- • ••• • • .. ~ _..v ,. ... ....-- ·-•• _ .. ... _. _ ... _ ,,.._ , _ ..., _ __ ,._....,._.._._..._......~-~-· .,.,._...~,.__. 'W~ ..
[.JPY88}
{1\anH I)
[1\arn]
[I\L8:J]
(l\ea90]
[KF6H]
[l\o91]
.Johnson, D. S., Papmlimitriou. ('. II .. ancl Yannakakis. ~1. llow
Easy is Local St•arch'! .JourTial of COIIIfJlllrr t1111/ ,C.,'ysln11 Hl'irllt'f.~.
:n, ;n-too. w~8.
1\ann, V. 1\laximum Boumlt•d :1-dinH'nsional ~lat.chinp, is l\1 AX
SNP-rompl<'l.<·. !~tfol'malion l'rm·t· . .;,.;in,t~ Ldlt'l' .... :Ji. :!7 :Jr,, I H!J I.
I\arp, R. M. Bl'durihilit.y Among Cornhinat.orial Pwhlt•rns. In
H. E. Millt•r and .J. W. Thatdu•r (c•cls.) C'ompll'.rily of ('olllfJlllrl'
Compulalion.o;, Plc•nurn Pn•ss, NPw York, I !17~. ~r. I o:t
1\arp, H. M. arul Luby, M. !\lonl.t·-( ~arlo Al~orit.luns for Enu
nwr·ation and lkliahilit.y Prohlt•ms. In l'ronnliu!/.'i of l!tl' '/'tl'nii!J
Foul'lh Annual/BEE Sympo.o;ium on lh Nwllflflliou.o; of ( .'olllfJtthl'
Sdrurf, mEE Comput.f•r Sol'it'l.y Prc·ss, Waslriugt.urr, 1> . C., I!JX:t
.16-64.
1\carns, M .. J. Tlu: CoiiiJIItlrtlioual Complt·.rily of Ala.drirtt· /,mrrt
iug. MIT Pn•ss, Caruhridgt~ , MA, l!mO.
I\ lug<·, A. (L aut! Farris, .J. S. Qumrt.it.at.iw l'hylt·t.it-s arrd t.lw
Evoluliou of ;\uurarrs. Sy.o;/r·malif' Zoolo!J!J, I H( I), I :J~, l !Hi!J.
1\o, K .-I. ()mnplt·.cily '/'hrm·y of Null Frwd io71.o;, Bi rkllii llsf•r,
Bostou, ~lA , I ~mI.
[I~STH!Jj
[1\T!HJJ
[1\THlj
[1\f\IBSI]
1\ijJ,Ier, ,J. Personal commuuication, .July 2:1, 1992.
1\i)t,J .. r, .J., Scfl()uiug, tr ., aud Tcmiu, .). On Counting and Approx
imation . Ada lnformal.im, 26, :m:~-:nn, 1989.
1\olait.is, P. G., and Thakur, .l\1. N. Logical Defiuability of NP
Oplimi::aliou Pmblr.m!'i. 'J(•rhnical report. UCSC-CRL-90-48. Uni
VI'rsi t.y of (:ali forn ia at Santa Cruz, Dcpart. meut of Computer aud
l11format.iou SdeJJf('S, H>DO. To appear in lnfol'malioll a11d Com
f>lllnlion .
1\olait.is, P. G., and Thakm, .l\1. N. Approximation Properti(•s
of NP Minimization Class<'s. In PrOl'f'fding.c; of 1/u Si.r.l!t Anuual
('onfrl't' flt'r 011 Slrurflt1'f' in Complr.rily Thr01·y. IEEE Compul«~l'
Sorh•t.y Pn•ss, Washittgl.on, D. C .. H>91. :J!):J-:366. To appear in
.Journal of Com puler a11d Sy,c;lcm Sric11Cf.".
1\ou, L.. Markowsky, G., and Berman, L. A Fast Algorithm for
Stt·int'r Tn•t•s. Aclalnfomuzlica, l!i, 1·11 - 1·15, 1081.
1\n•nt.t•l, 1\1. \V. Tht• Complexity of Optimization ~roblems . ./oru·-
1111/ of Compulrr and Sy..,lcm Srinu'f,c;, :J6(:l), 490-50n, 1988.
1\r<•nt.<•l. ~t. \V. G('lll'ralizations of OptP to thl' Polynomial Hil'l'·
at'rhy. '/'lnol'dical Compulrr Scir11rr. Bi(2). lS:J-198, HlB2.
I !).l
[Kre92b}
[l\ri86}
[l\ri88]
[I\M86}
[Lad89]
[Lak87]
[Lat82]
... .... , .. ., ........ -... ~-· .. ~ ·· · . . ,._ .. ... ~ ··· ·· .... ........ -.... _, __ ... , ... _._ .... ,_ , __ . __ ~ .......... -,.; ____ .._.
1\r<•ntd, ~1. \V. PN~onal rommunkatiou • .lui,\' I :t 1!1!12.
1\fivant>k, ~1. On tlw Computational <'omplt•xit.y of <'lu:-;t,•t·iu~.
In E. Diday, Y. Escoufit>r", L. Lt'hart, .J. Pap;1•s, Y. Sdlt'kl.rwut,aud
H. Tomassofl{' (<'ds.) /)ala ,lnal,tJ . .;i,.; and lnformalit·.~ II': l)mnnl
ing,c; of lhr Fourth lnlrrnalioual S,IJIIIJIO.'iium on l>ala :lualy . .;i.~ 1111d
lnformalic.o;, Elst>vit·r Scil'lll'<' (North-llollnnrl). AJJt:-;knl<llll, l!l~(i.
8!)-Hll.
Krivanek, M. Tit<• CompiPxit.y ur tllt.rauwt.ri<' l'artitio11s on
Graphs. Information Prm'f',o;,o;in!t /,1'1/t'r,o.;, :n, 11i!) :no, I!IXX.
Krivanek. M. and Mor<i.v<•l,, .J. N P-llard Prohll'ms irt llic·r·mddnd
Tre<' ClustPriug. Acln luforma/i('(t, 2:1, :J II :1:!:1, l !)Hii.
Ladner, H. E. Polyuom ia.l Sp;u·<· Count.iug l'rohlc ·ms. Sf tl/\1 ,/o" ,..
nnl on Com1mling, JH((i), JOH7- I O!J7, I!}S!J.
Lak(•, .J. A. A Hat.('·lrllh~pc•uflc•ut 'J<•dllliquc~ for At~ a lysis of Nurlt·ic
Add S<•qu<'lll'es: Evol11t.iouary Parsimouy. Molu·ular /Jiolo!J!J ami
B1wlulio11, 'l(:l), 1£)7-IHl, l!JH7.
Lathrop, G. M. Evolutiouary Tn•(•s aud Ad111ixt.un•: l'lrylo~t·rwl.it·
l11fereuce WIJeu Some Populatium; arc· JlyiH·idbwd. Auunl .... of /Iu
man Gnulir:.r;, •H), 11!i· :lT>!i, I!JH:l.
[LWH~J
[LV90;1]
[Lip9~J
Lc~t!, A. H. BLIJDGEON: A Rlunt Instrument for the Analysis
of Conl.amiuatiuu in Textual Traditious. In Y. Choueka (eeL)
r:ompulrrs in Liu,quistif' and Lilrmry Compul i11,q: Literary and
UntJni.<~lic Compulin,q 1.98& P7'0cadi1lgs of ihf'- Fiflcenth Annual
Confnrn('(', Champiou-Siatki ne, Paris, I 9BO. 261-292.
Lc•ngaw·r·, T. and WaguN, K. W. The CorrC']ation hetwrl'n the
Complexitic•s of tilt' Nonhierarchiral and llirrarchical Versions of
( haph ProhiPms. ./ o urnal of Comp ulcr and System Srir.uccs,
Li, M. awl Vitanyi, P.M. B. lnductiw Reasoning and Kolmogorov
Comph•xity. Jout·nal of Compulrl' and System Sciences, ·H, :J·I:J
:HH. I !l9~.
Li. M. ami Vitanyi, P. M. H. 1\olmogorov Complexity and Its
Applkat.ion:-~. In .1. \'an Ll't'llwen (l'd.) 1/andbook of Thcorclical
C'ttiiiJHt.ffl' 8dnu'f', l'o/umr A: Alyot·ithm.c; und Complc.rily, MIT
Pn•:-;s, Camhridg<', ~1 A. I mJO. lSi -25·1.
Lipst·muh. D. L. Parsimony, Homology, and the Analysis of Mul
t.istatP Clraraf'ter:;. Cladislil's, 8(1), ,15- 65, 1992.
Lurkow, ~1. and Pimc:>ntl'l, R. A. An Empirical Comparison of
N lllllt'rical Wagnt'l' C'om1 .utt•r Programs. Cladi.<il ir8. I ( .t), 4i-66,
Hili
[McD90]
[McM77]
[l\·1 ad!J I]
[M RS9~]
[ME8rJj
[MS72j
.. -.. , '·•· · --~- -~-··----·-·--- -'·---·---~·~---,--·· ---
W8!l.
MrDadt'. L. llyhrids mul Phylogc•nt•t.k Sy:-;tt•mat k:-; I. Patlt•ms of
Charartc•r Expn•ssiou inllyhridsaud t-hc•ir lmplicatiuns for Cladis
tic Analysis. B1•olutiou, .J.I(ll), lm"r> liOn. I mHl.
Mrl'vlorris. F'. H. On til<' Compatihilit.y of Binary Qualitat.h· .. Taxo
nomk Charaders. l.Jufldin of Malhnnalintl Uiolo!llf· :m, 1:1:~ I :lS.
)!}77.
f\.,laddison, D. H. Tlu· Disc·m·t•t)' and lmport.allc'P of 1\lult.iplt• Is
lands of Most- Parsimonious Trt•c·s. Sy.-;/ t ///(//it· Xoolo!flJ. ·I 0(:1). :m.t
:114, I!J!)J.
Maddison, D. H., Huvolo, M. awl Swofford, I>. L. <:c•op;raphir Ori
gins of I!nmau Mitorlaoudrial DNA: Pl1ylo~t'lll't.k Evidt•un• fro111
Control H('gion Sc·q•wun•s. S!p;/fmnlir· lliolo.tm, ~11 ( 1 ), Ill l :l ~l,
Jn!J2.
Meacham, C. A. and Estabrook, ( :. F. Cotnp;ll.ihilit.y Ml't.hculs
in Syst<'lllat.irs. 11nrwal Uroi, w of l~'f'olo.tf!J a11tl Sy.~J, 1111din;, lfi,
Meyer, A. H.. awl Stoc:ktllt'Y('r, L .. 1. The Equiwdt•un· l'roltl«'lll for
Hegular Expressious with Sqnariug luvulw•s Expolll'llt.ial Span•,
In tlrf~ /.1/h Amnwllt::I·.:H ·'hJmrm.~ium w• Smill'hi"fl ami AttiMmllfl
Iii7
[MicX:l]
[MIJ.JH!))
[MS~tlj
[ 1\1 II! HI I
! l\ lotH:!]
Tltr:o1'y, IEEE Computer Society Press, Washington, D. C., 1972.
1 :l!> -J:w.
M ick(~virh, M. F. Trausformatiou Seri<'s Analysis. Syslenwlic
Zoolo.IJ!J, :J I ( '1), ,Hil --·178, W8~.
Milosavlj(!Vic, A., llam;sl(•r, D., and .Jurkil, .J. lnfornwd Parsimo
tlious luf<'I'<'IIC'<' of Prototypical C:(•twtic Sequc!llces. In R. Rivest,
D. llaussl('r, awl M. K. Warmuth (cds.) COLT '8.9: Proca·diug.r; of
fhf· Srrond Annual 111ork.r;/wp on Compulalional Learning Theory,
Murgau 1\aufmarm Pnhlisltc•rs, San Mateo, CA, 1989. 102-117.
Molli(•u, B. and SpeckC'nmeyu, E. Some Further Approximation
Alp;orit.luns for t.hP Vc~rtex Cover Problem. In G. Ausiello and M.
Prot.asi ( Pds.) CA A P '8,1. Lc•rt urc Not.c•s in Computc>r Scicnct' 110.
l!i!L Springc•r-Vc•l'lag, B<'rlin, 198:3. 3·11-:1·19.
1\lorit.;~., C. and D. M. Hillis Molecular Systematics: Context and
('ont.rm•c•rsit>s. In D. l\1. Hillis and C. Moritz (eds.) MolcculaJ'
H!f . .;/nnalit· ... , SinauPr Associates, Sunderland, l\IA, 1990. 1- lO.
~lot.wani, H. Lrl'llll'f' Nolr : 011 Appm.riuullion Algorithms: Pari
I. i\lar111srript.. l!m2.
Nt•i. l\1. Molrcltlar El'olulionary Gcurlics. Columbia University
Prt•ss, Nc•\\' York, 1!)~7.
lf!H
[NeiS:J]
[NP81]
[Nig75]
[OM90]
[PR90)
-- ..,,.,..._.._.., ', .. .-~- .... .,., . ... -~, .. ..,._. . . ,.,. • ;o.o,--,-~_,._.._,,.--.-..... _ .. , .. - ,..,..,_•--- ·--~__....,.._.,......_~_.,.-,.,__,~~.,_--_.....,. ......
Nelson, G. Rl'tirulation in Cladop;ram:-;. Itt l'lat.nick. N. I. and
V. A. Funk (<'dK.) Adr'allt'r,.; i11 Cladislif' . .;, \'olumr ;!: l'mt·r.,-tlill!l·~
of llir Scrond Mrflill!l nf thr Willi llrllll(q Sot·idy, ( 'ulumhia t lni
V<'I'Sity Pn•ss, N<'\\' York. I m~:l. I()!) Ill.
N<'lson, G. and Platuirk, N. I. Sy.o;fnnalic.<; llllll lli<•!II 'H!/I'trJih!f:
C'ladislics and l'icariant·t·. Culumhia lJni\'t•rsity l'n•ss, Nt•w York,
W81.
Nigmatullin, R. C:. Comph•xit.y of t.ht• Approxiutal.1• Sulut.io11 ul'
Combinatorial Prohlt•ms. fJoklndy Ahuhmii Nt111~· SSS/1, ~2-1.
289- 2!)2, I !)7!) (in Hussiau ). Enp;lislt l.ranslatiuu (iiH·oqmral.iup; au
thor's corrl'rlions) in So11ir .; Alalhnualic.o; /Joklt~tly, W, II!I!J I ~ll:t,
1915.
Orpoueu, P., a11<l Maunila, II. On Approrimalion-l)n.o;t·rrrin.'l u,..
ductio11s: Complr.lr. Pro1Jlr·m.'4 aud llobr11d Ahwwrf·.'4, Miiiiii!'Wript.,
IH90.
Pancon<'si, :\. aud Haujau, D. Quaut.ifiNs a11d Approxirnal.iou
(Extended Abstract). lu Prot:tf'fli"!l·o; of lh.f· ::!:Jml 11 (.'M .'i!JIIIJJO.o;irwl
ott 11tcor·y of CompulintJ, ACM Prt~ss, Wasl.iugl.ou, f> . C., I !J!JO.
t146- 456.
1 !j!J
jPYHfi]
[PY!H]
[I' JIS!)~J
[PhiH·I)
[ Pln~m)
[PW7li)
lHk~!l]
Pap;ulimit.riou, C. II. and Yannakakis, M. A Note on Succinct
H«•prt'sent.at.ions of Graphs. Information and Conlrol, 71, 181-
1 X!i, I !}8().
Paparlimit.riou, C. II. and Yannakakis, M. Optimization, Approxi
mation, aud Complc·xity Classes . ./ott7'11al of Compulcr and System
,c,·,·if11t'r·.-., 4:1, ~t~5-- ,Jtl0, 1 9fJ I.
p,•uny, D., IIPndy, M. D., aJI(I Stell, M. A. Progress with Meth
ods for Coust.rudiug Evolutionary Trees. Trrnds in Crology and
l!.t'11oluliou, 7(:1), 7:1-79, 1992.
Phipps, .J. B. Problems of Hyhridity in the Cladistics of Cratacgus
( Hosa.rPa<•). Jn W. F. Cerant (eel.) Plant Biosyslcmatics. Academic
Prc•ss, Toronto, 1984. ·117-4:38.
Platuick, N. I. An Empiri(_al Comparison of Microcomputer Par
simouy Programs, II. Cladislirs, 5(2), 14f)-161, 1989.
Prag('l', E. l\1. and Wilson, A. C. Congruency of Phylogenies Dc
ri\'t•d from Dilrt>rt>nl Proteins: A l\lolccular Aualysis of the Phylo·
gt>twtic Posilion of Crarid Birds. Jotu·nul of MolcculaJ' Evolution,
9 •. If) - !i7. HJ7(i.
Hichards. D. Fast llt•uristic Algorithms for Rectiliuear Steiner
Trt•t•s . ..\1!/0I'ilhmira, ·1. W l - 20i. 1 9SH.
HiD
(Riv90]
[SC8:3]
[SchR86)
[SchU86]
[SchU90]
[Sel91]
Riv~?~t, R. L. Cryptography. In .1. \'an L<'t'll\\'t'll (t-tl.) 1/nmllwo~· of
Thror·climl Computer SciruCf, \'nlumr :\: t\f.qol'ifhms 11111f ('om
pic .ril y, MIT Pt·t•ss, Camhrid,!.!;<'. ~I A, I !)!)0. 71 ;.. ifi!i.
Sankolf, D. D. and C<•rl(•rgt'l'll, H . .J. Simult.aru•ous ('olltpnrisou
of Three' or Morl' Sl'qtu•nn•s H(•lat.l'd h.r a Tn•p, l11 D. Snnkolf
and .). B. Kntskal (eds.) 'J'imr H't~r·p.o;, Slrill!l J~t!it .... a111l Mtu'l'oy
molecules: The Thcol'y ami l'r •rdin of Srqurnn ( 'tHIIJIIII'i.o;oll,
Addison-Wesley, B<'ading, MA, I!}H:l. 'l!i:l :W:L
Schoch, R. M. Phylo,qrny Urt•nn ... trudion in l'u/,.onlolo!fy. Va11
Nostrt~.nd R<'inhold, N(•w York, I !}~fi.
Schoning, U. Comp/c.rily ami Slr·udttrt·. Ll'ri.UI'I' Nol.1•s in Cotll
putcr ScictJ<.'(' no. 211, Spriup;<•r-V<•rlag, BPrlitt, lm'lti.
Schoning, U. Tlw Pow<•r of Counting. In A. L. SPim;ut (Pel.}
Complexity Theory fi.t·lro.'ipt·t·lim:, Spriug1•r-V<•riHJ!;, BNiiu, I!IHO.
204-22:1.
Sclmau, A. L. A TaxmuJm!J of C'mnplt-Jily (!/as.'if.'i of Ftwdion.o;,
Technical report, St.at1~ tJuivcrsity of N<~W York at Buffalo, l>c•
partment of Computer SdPun•, IH!H. To ilJ>JH'ilf iu ./otl7'11fll of
Compulcl' and Syslt:m Stir: rU'f,o;. Au ahlm•viatt•d wrsiou apJu•;trPtl
lfil
[SimWWJ
[Sim.J7i]
[Srll'in]
[Sny!l2]
[St.a(~i.'i)
[St.aT~OJ
in Uulldin of the E1Lropcan Ast~ociaiion for Theoretical Computer
Scif:ncc, ~!5, 111-I:JO, 1991.
Sdmau, A. L. Personal commuuicatiou, .July 6, 1992.
Sirnou, II. lJ. Coutinuous Hcduct.ions Among Combinatorial Op
timization Problems. Acta Informatica, 26, 771-785, 1989.
Simon, .J. On t.ht~ DiffcrPnce between One and Many. In A. Sa
lomaa and M. Steiny ( cds.) A utomat.a, Languages, and Program
miug - ,flit Col/oquim, Lecture Notes in Computer Science no . .52,
Spriuer-VPrlag, Berlin. 480-490.
Sru•at.h, P. II. A. Cladistic. Heprcsentation of Reticulate Evolution.
8y.<;/nnalir Zoology, 24, !160-368, 19i.5.
Suyd(~r, T. L. On the Exact Location of Steiner Points in General
Dinwnsiou. SIAM Journal on Computing, 21(1), 163-180, 1992.
Stan·, C. A. Hybridization. In C. A. Stacc (ed.) Hybridi;;ation
and the Plora of the Brili,qh Isle,<;, Academic Press, London, 1975.
t-no.
Standish. T. A. Data Structure Techniques. Addison-Wesley~
Ht•adiug, ~1A, 1980.
162
~ - ~-.-~ .. --.. - ...... _ . __ ......__ ____ " ___ f¥ _ ___ _.,.....,. 't t>.- • ...,.1 1
,. [Sto77} Stockmcycr, L .• J. The Polynomial Ilil'rarchy. Tlu·orrlical Com-
ptlfcr Scie11cr, :3, 1-22, 1977.
[Sto85} Stockmcycr, L .. J. On Approximation Algorit.hmH for # 1'. Sf,\ M
Journal on ComJmliu.q, 1·1, 8,19-S(il, J!}H!i.
[SSV92} Stoneking, M., Slwrry, S. T., ami Vigil<lllf., L. Gt•ographic Origiu ol'
Human Mitochondrial DNA Rt•visit.f'd. Sy.~tr·mafit·/1iology, ·ll(:l),
384-39 l, 1992.
[S090} Sworrord, D. L. and Olsen, G .. J. PhylogPIIY Ht•(·oust.,·uctiou. In
D. M. Hillis ami C. Moritz ( cds.) i\4olrcular Su.~l nnal h·.<i, Sina.IJ('I'
Associates, SundcrlaJJd, MA, 19!JO. tlll -.'iOI.
[Tho82} Thorpe, R. S. Rct.iculat.e Evo\ul.iou a.11d Cla.Jis111: 'l'l':;t.s for t.IH'
Direction of Evolution. B:r.pcricnlia, :JH, 12tl'l- l'ltl ,l, I!IH'l.
[TW92} Toda, S. and Watanabe, 0. Polyuominl-tinw 1-Turiug n~dudious
from #PIIlo #P. Thcorclir.al Cmn]mlc1' Sdt:tu'f', I 00( I), :lO.'i ::!21,
1992.
[Tor91} Toran, ,J. Colllplexity Classes Ddiuerl by Counting Quaut.ifi(~r~.
Journal of the Association Jot· CouqmiinrJ Mru:ltiur:1'y, :SH(a), 7!J:I
774, 1991.
fVal7fi]
[Wagi\X(iaJ
f\Vagl\~7]
[\ Va~l\ ~~~
Valiant, L. (;, Tlw H(•lativc~ Complexity of CIJCckiug and Evalu-
at.iug. Informal ion l)rot•r,.;sin,q Lr llr'l's, !l. 20--2:1, I 9i6.
Vali<ntt, L. C: . TJ, .. Cornph•xit.y of Ermnwrat.ion autl Reliability
Problt•tns. SIAM .JouT'tud o11 Co111puliug. S, ~ti0-~121, 197~J.
Valiant, L. (l. Tilt• Comph•xity of Computing tlu• Perrmuwnt.
'l'l~tol·l'fiml ( .'ompulr r Srif nrt', 8, 189-201, 1!)79.
\Vap,twr, 1\. W. Til<• Complt•xity of Comhmatorial Prohl<'ms with
Surcinct Input H<'IH'PS<'tltat.ions. Ada Jujo1'mnlira, 2:! , :n1-:J!;6,
l!lS(i.
WaAtwr. 1\. W. Somt• OhsNvatiorJs 011 til<' Conu<'ctioll betW<'<'Il
('ounting a11d Ht·rursion. Thrordiral (.'ompulrl' 8ch·ncr, 47, I:JI-
1·17, I !lXfi.
\Vap;rwr, 1\ . \V. !\Jon• Compliratt•d Qut•stions about Maxima and
~lininw. 1111d Sonw Clustll't'S of NP. Thm1·f'!icnl Compulf'l' Scit 11cc,
rd. :;:t so. 1 !187.
Wav;uer, 1\. W. Bouudt•d Qut•ry Computation. In the Pm('(ffl
iU!/8 of lhr Thil'd A111wa/ Con/fl'r tl cc 011 Structure in Complc.rily
Thr ·oi'!J, IEEE Computt•r Socil'ty Press, \Vashington, D. C., 1988.
:wo ·'li7.
I().!
[\Vagl\89]
(vVagl\HO]
[\V\VS()j
{\VogW68]
[\Vag\V80]
[Wil81]
[\VSBFBl]
.... -- ---·-· .. - .. . --- ·-""" ·--~- .. ~~- -.-.......... -... _.-.. -.---~·-··,...---JW'· ................ --·-4-
Waglll'f. K. \V. 1\ 'umbrr-of-Qun·y lliaan·hir.... 'l't-t·hnkal Ht•
pmt no. ·1. lust 111 fiir Informal kk. Baynisdll' .Julius-~laximiliaus
lln i \'l'I'Si t a I . \Vii rz hu rg. I!}:-\!}.
\Vagn('r. K. W. Bonndt•tl Qm·ry ('lasst•s. 8/.-\.\1 .lormw/ on ('om
pulirrg. 19(!l). ~t\:\ - ~-W. lll!l\1.
Wagw·a-. K. and Wt•dtsuup;. <:. ('ompufnlional ('olllfllt.rily '1'111111'.'1·
\Vagtu•r .Jr., W. fl. llyhriclizat.ion. Tit:WJIIIIII_\', and Evol11tiun. In
V.ll. llt•ywoocl (c•tl.) Mmhrn Mdlwds in 1'/on/ 'lluoiiOIIIJJ. Bol.ani ·
cal Sucit'l.y of til<' British lslt•s { 'tmfi'I'C'IIl't' Hc•ru,rl. no. Ill. Anull'lllic
Pr<•ss, London. I !)(iS. I I:~ I:~~'-
\V<tgll<'r .Jr., W. II. Orip;ill a11cl Plailosoplty of t.ll<' ( :rourad(llilu
diverg('ll<"<' Md.lwd of Cladist.il's. S,tj.•dt 11wlir· /Jofony, !';(~), 17:1
WiiPy, E. 0. fJfly(o_qf'udif·.o;: 'f'J,,. Thwry and l 1rtJ.dit'1 of l'hylo!Jt
nf'lic Sy,c;lr·malit·,<; . . Joltu Wilc•y, N('w York, I!JXI.
Wil<•y, E. 0., S<·igei-Cawwy, D., Brooks, ll. H., aJHI l-'1111k , V. 1\.
The C:omfJ/cal C/adi.'il: A [11'tltl.f' T' of [JJ,ylo_qnu:l i t· flr·ot·rdru·u;, S1w·
cial Pu hlication IICJ, I !J. 11 IIi versit.y of 1\allSiL<; M 11S(~IIIIl or Nat. ural
History, Lawrence, J(awms, I!HJ I.
l(j!j
[WiuH7]
[Wra7iJ
[ Yau!}'Lj
Wiut1·r, P. St.eirwr Problems irr N<'l.works: A Sun·c·y. Nf frrork.'>,
17, J:.W Hi7, I!JH7.
Wnttllall, C. Cornpll'l.<· S1•t.s arrcl f.lu• Polynomial-tinw fli<'ratThy.
'/'hrordi('(ll ( .'ompufr r· Sr·ir 11r·r, :J, ~:J -:J:J, lnii.
Yarlllakakis, M. Tlw Aualysis of Lncal S(•arch Prohlems and tll('ir
I !f·urist.ics. In C. Cltolfrut. and T. Ll'ngaHN (<>ds.) 81'A C.'i '.90:
7th A 111/llfl! SylllfJO.'>ium nn Throrrliral A.r;prd.'i of Cnmpulrr Sri
,. nn, L,·,·t.•rr·1· Not.c•s i 11 (~om pu 1.1'1' Scic•rr n• rro. ,J l .1, Spri ng1'1'· Vc•rl a g.
Yao, X. Findiug ;\pproximal<' Solutious t.o NP-hard problc•ms by
Nt•ural Nl'tworks is liard. lnfnrmnlion Prorr:ssing /,cl/cr,r;, 41 (2),
!Ia mt I H!l'L.
166
.. ~?· ' r· : ~
~:: ' > i
A Phylogenetic Systetnatics and the Inference
of Reticulation
In this appendix, I will gin• a short n•\'ic'\\' of \'arious approarllt's to infc•tTill.l!,
t'Pticulat ion, fnllowc•d hy a j usli ficat.ion of I lw n•l irnla t c• phy lop;c•n••tic· parsinloll,\'
prohh•m schc•m;,ta dc•finecl in Sc•ct ion :t:U. For in-clc•pt h rc•\'ic•ws of I he• lupin• iu
this apJ><'mlix, sc•p [Fun/).1, Gra~l . SchH~(i, St.a('i!iJ.
Hdiculat.c• t'\'t•nt.s as dc•snilll'd itt Sc·c·t ion ~.I an• part. of hiolo)!;ic'al c'\'olu-
lion. lly hridizat.ioll has cH'<'III'I'C'cl frc•quc•ut.ly i 11 111a ny p;l'llll ps of plan l.s and l•·ss
fr<'<JIU'IItly amoug animals, uot.ahly in birds and lishc•s I< :ra~\1, pp. :W:! :!II·IJ,
aud introgn•ssion, a fon11 of n•cotubiuat.iolt i11 whirl! dtat·ac·t.Prist.in; arc• pas~wcl
via hybrids from otH' spccil's t.o ii!Jot.ll<'r, :;c•c•Jits t.o mTJII' wit.h p;n·al.c·r frc·quc•tJc·y
than pn·viously t.hougltt., <'SJ><•cially among t.lu• rytoplastllir' and lllll'lc•ar p,t•uc•s iu
plants and animals (s<'<' [DH.A!J2] and rl'fc•rc!IICC's). TIH' c•volut.ion;•ry sig11ilkann•
of such reticulation has been cldmt.t•cl for clc•cadc•s; for iusl.illll'c•, IJyhri1li;,at.io11 has
hccn viewed as mere noise• 011 t.lw uuclc·rlyiug subst.ml.<• of •lkhol.ot11o11s I'Volu-
t.ion [WagW68], as a11 import.a11l fort:<' itt part.ir.ular groups at. pHrt.inJiar t.itiJc•s
(Gra81, pp. 17!1- 189), ami as t.lte domillaut force! in pl;tllt. c ~volutiou !St.aC7!",,
Lot.sy (1916) quoted on p. 2-1). H.egardlc~ss of sudt cldlitll ~ , n·t.ic·.ulat.iotl aud its
inference is crucial to many iuve:;tigat.ors.
There arc many t.radit.ioual biological heurist.ic:s for n~co~JIIWll!, iudivitltHc.l
W7
iusUIIII't's of l1ybridizatiou m11l n•corubiuatiou, basl'd 011 tlu• iutc•rnwdiat1• ua-
luw of t.lw produn·d dmrad.t•r-stal.«!s and various at.t.rihuh•s of til<' propmwd
hyln·id t~nrl its parc•nl SJH'f'tPs c•.g. g.•ographicttl dist.rihut.iou, parc•ul<tl int<'l'f..r
t.ility. PXpt·ritllt'lllal 1'1'-I'I'Patiou of ltyhrid [Grtt~H, StH<'i!l]: sollll' of tlu•st• ll<'uris
tit's Jtavt• l1t 'l'll coclt•d iiS IIIIJIH'I'i<"a)II\I'HSIII'I'S (flybrirf i11dins) ([C:ra~l, J>J>· '2Qj-
~f1Jj; [St.aC7.1, pp. 7·1 ~~]). lkn·nt.ly, algorithmic nwt.hods h<WI' ll<'<'ll prupost•d
fur in ft•rri 11,1.!; hyhrid iza t.io11 uudc•r t.lw 1'0111 pa t.i bility [Srwiii], maximtt rn li kl'li hood
[l·i · l~t!, l.at.X:l], <111d phylop;l'll<'l.ic parsimony [FuuH.1. lll'i!JO. L1·C'~H. NPIH:L PhiS·I.
Tl1oX'2, Wag;WHII) nitt·ria. Tlw focus in t.ltis apfH'IIclix will he• on those• nwt.hods
hmwd o11 t.ht• phy logl'lll'l.it· parsimony nil.c•rion.
All lmown parsiruony-bas1•d nwthods infl'l' hyhrirlization using the dwractN
nmll il'l. i ndu('(•d hy hyhricls. In a phylog<•netic parsimony aualysis, th<• t.heort•t.ical
lo\\'1'1' limit. uu rost. is that. l'ach charact<•r-st.a.t<• transition 1'\'ellt. occurs only once
in a t.n•<•; t.lu• portion of <1 tr<'<' 's cost. a how this theoretical minimum cousists
of ildditional IIYJ)()Lh<•s<•s of dmract<•r-st.at.c• transition (hmnopla.-;y) which are r<•
quir<•d l.o Pxplain clwr·act<•r slates that did not arise only once in that tree. The
phylop;l'll<'l.k parsimouy cl'it.eriou, in pl'l•fcrring trees of minimum cost, minimizes
lton1oplasy. Wh<•u t.h<• possibility of <'rror in character analysis has been ruled
out., homoplasy is a sign that e\'olutiouary processes not belonging to the single
t.rcttlsit.iou, dichotomous-speciation model have occurred. Reticulation as defined
in t.his t.lwsis ('omprisl's one such set. of processes.
HiS
. . • ·- ~ .. , ...... ~ .... .. .... . ... ..-...... .,, """_,_ .... ___, ,,_..,,~ . ............. ,_ft ·-"'-·~~.._... .. __
Following [Fuu~il]. all parsinwny-ha:wd nwt hod:- for iufcrriu~ hyhridi;mt j,,n
can lw classifit•d into t hn•p approadws. dt•tu•Juliu).!; on how I lwy dt•al with honw-
plasy.
I. ludntlt• rt>tirulatinn implicitly \'ia thl' ho111oplasy in till' must parsilllOJiiuus
tr<'<' [NP~I].
~. ldt•ut.ify aud J't'IJIO\'t•ltyhl'id t a:w IH•forc• phylo~t'llt'f ic illlalysis. aud iut rwluc·c•
rl'f.iculation <lfll'l' phylogt'IIPt.ic <lll<dy:-ois to ilC't'tJlllflloclatt• t ht•st• tax a oil tlw
hasis of homoplasy in t.lw most. parsimouions t.n•t• !Wi!p;W~OJ.
:t lndndt• all taxa in t.h<• phylog<'IIPt.k analysis, and int.ruclun• n·t.intl;•t iun ancl
hybrid taxa as IIPn•ssary ('it.lwr <luring [PhiS·tJ or aft.t•r [FunX!i, l.t·t'~X, Nt•IX:t,
Tho8~] ana lysis, 011 til<' has is of homoplasy.
Each of thc>sc• approachc•s has int.riusic diflicult.it•s ht•t·alJst• J't•t.il'ldat.ion ,., .. , lu• d•m
acl.erized by a wide variety of charart.Pr-st.al.t• patl.l'ms, "ot.l1 within t.llt' pnulllt't•d
taxa and within any non-rdic:ulat.< ~ 1.rt•(' iuchuliug t.llt'St' l.ax;1 (F11uHi,, llu111~a,
Mc090, St.a.C75]. Mm·cnvPr, thesP approadws aw not sal.isfad.ory fur ddiu
ing criterion-based problems heralls<~ tlwy ;u·t• hased on spt~cifk algorithms a.11d
heuristics (sec Section I).
There are no general difficulties with iuferriug rdir:ulat.iouusiug tlu: tmr!'iilllcmy
criterion: reticulations remove homoplasy ),y unifying tlw tJI:CIJI'f'ellt:t! of st!t•IJiiiiJ!.IY
inr.ompati blc character-states into a si ugle eveut, a uri t:au t.llus lw ra11 kt!rl (as am
I lHJ
tn•c·s) hy tlu• tlc·t·rc·as<~ iu homoplasy that they indurP, llowewr. t}H•re are S<'\'Nal
spt••·i(i(' dillindt.iPs.
I. 1\s appropriat<· rl'l.irulation cau n•pr<'S<'III. any uumlH'r of char<trter-stat1•
t.ransit.ious in ollP <'\'<'111.. unlJolllHI<·d rl'f,iculation n•udc•rs dichotomous SJ>P·
,·iatiou irn·IPvant. illtd t.lw phylop;t•tJ('I.ir parsimony nit.Priou lll<'illtinglt•ss
!NPHI, pp. ~17-11Hj .
~. 1\s l!olltoplasy is lls<'d to justify t.IU' additiou of l'<'l.iculat.ion, it is no longc>r
possihlt• to liSt' honwplasy as il sign of possihiP <'rror in rhararter aualy
si:; and •·odiug (t.lu• "s<'lf-illuminatiug" proJH'rty of phylog<'rwtir parsimouy
;tualysis [Will'\ I, p. J:m]).
:J. lt. is 11111rlt nton· dimrult. to inf<•r phylogenies hy hand using 1-IPnnigian argu
lll<'tlf.at.ion [Wii~H, WSBFntj or hy algorithm whl'll r·t>tirulation is a!IO\\'<•d;
nwn•ov<"r, t.ht• produn•d pltylog<'ni<'s cannot lw n•adily used as the basis for
hit•rardtiral Linm•au classifications of stwcies.
Tlu• first I wo of t.IU's<' diflkulties arc actually guidelines for the formulation of
usdul c·onlput.at.ioual problems. By (I), a problem should only be able to infer
a limif.t•d iiiiiOHIIt. of wdl-dl·flned rt•tkulation for a given instance, and this limit
should ht• uudt>r thP control of the investigatoL By (2), such a problem should
ouly IH' im·okt•d aftN a non-reticulate analysis has been performed to detect
pos~ihlt• t'tTors in character roding, and to determine if there is any homoplasy
170
. ...... . ~-· · · · ~ · . ' . . .. u .. .... . . .......... ...... . .. . ..... .... . .. . ... ... - ~ ... ~ •• _ .... ····-->'t"t .. -'IHO' .. -~.---·-····-·--· ·-·---··-... , .. ~-...
that ran lw t•xplaiw•d by r«'ticnlat iou. Tlw dillirult it•s in (:q fii'C' t'tlllst'tl'tt'nn·s
of searching for phylogt•ru•t ic t.n•t•s iu a ridlt'r hypot llt'sis·spat't', anti 11111st ht•
arn•ptt•d if rt'tirulat<' hypot ht•st•s art• dt•sirahlt•.
Th<' rt•lirulnlt• prohlt>lll scht•mat il dt•liru••l iu St•l'l io11 :t:U sctl isr_\' t.ht• tirsl l'oll·
dilion ahm·P. and a pron•tlun• pal 1Prltt••l artt•r that p;ivc•u i11 \N••Il'::,J, i11 whic·l•
l'Pliculatious an• addt•d o1w at a tinw to t.ht• must. pmsimunions I l't't' snl'h that.
t.lw homopla.sy l'<'llloVt'd with t'at·h illst>rt.iun is maximi~<'tl, will satisfy t.lll' st•routl.
Surh a pron·dmt• using tbPSt' sclH'mata would dilrt•r rrolll that. ill !Nt•hO::\l ill t.hat. it.
would ht• ahlt• to S<'HI'rh 0\'t'l' t.ht• wholt• span• of a\·ail<~hlt• rt'l.intlalt• phylo~t·uit•s,
not. just thost' that. cau ht• rt•<trllt'd by additions of J'PI.indal.iun t.u t.lw most. parsi·
mouious IIO!I-rct.irulat.t• phylogt•uy, aud may thus lu• il hiP to find lt•ss ollVious hut.
equally valid solutions. This procPdun· is uul. irnlllllllt' t.u t.lw prohlc•ms disc·usst•tl
ahove of r<•roguizing tlw paf.f.<•nrs of homoplasy t.hat. imply rt·l.inrla.t.ion, or tlw
possibility thatt.h~ ob;.{•rvcd homoplasy may ha.vt• ol.llt•r· t'il.IISt's ''·.'!;· rnult.iplc• spt'·
ciatiou, ecological convergence, or· llu~ inclusion of mw<'sl.ral l.it.X<I. iu t.lw ,l!;iVPII taxa
[NP81, p. 265]. Moreover, this proce<lme is uot. so much a uwt.lrorl for prorlul'inp,
phylogenetic trees as an aid for exploring t.lu~ sp;u·p of phylogt•udk hypot.lrc•sps,
However, this is consistcmt. with the vi<~wpoint that sy:;l.t•m;tt,in; tlons 1101. so lllllt'lr
derive evolutionary history as oht.ain succ:essively bett.c~r approximations l.o it.
The beauty aud power of these reticulate prohlern sdwmata is that. t.lwy rio uot.
depend ou the precise structure of the permitl.<!d rdic:ulat.icm <!Vmrt.s. This allows
I 71
iuv,·st.ig;.t,ors to ddirw n•tinrlaticm ~ ~veuts appropriate to their uecds, ami rcn
fl,·rs tlw corn•sporuliug N P-com pll'terlf'ss proofs for surlr prohlf•rns trh·ial. Otlrt•r
sdll'rn;~l.a uwy lu• ,lf.rillf'd by allowing \V('ightPd n•tirulations or polyuomially
IHJIHIII<·d sl'l.s of forl1idd<•rr rPticul;tt.iorrs.
Tlw !JypPI'I!;I'ilfllr fornutl ism givPrr irr tlris tlwsis should IH' adoptPd t.o d1•srriiH'
n•l.inrlat<• 1'\'<'llf.s irr plrylog<'llf'l.ic systc•matics. llyJH'I'<li'<'S provid<' unific•d I'PJ>I'C'
s<'lll.al.iorrs of c·umpl<·x c•\·olutiouary plu•rronw11a. Morc•ovc•r, such a formalism will
111a kc· till' l'f'l'ogu i tiou mrd t.rausf<·r· of n•l<>vaul. rt•sults fmm other fields t•asic•r.
'I'll«' N P-,·urrrpiC't.t•rr,•ss n·sult.s giv<'ll ill this t.lll'sis art• OIH' c•xamplc. Of J>C'rhaps
111on• prill'l.if'al liSP woulcl IH' til<' applic·at.ion of \vork dorH' in datahas1• design
[A I>SX(i, AN 1!10, IH'M YH:I, Fag8:l) to algori thrns for ron st. ructing rcticulat.c phy-
lo,'.!;«'lll'l.k 1.n•c•s.
li2
··r--·- ~··- ........ ~\••' .. ... · ·~ 0 • 0 - ·~ 0 ... ~ . ... - .,_ . .. - - .. ~ "" " . .. . o · - · •• A ' ••• " .... ~ ........ 400 _, .............. -t~ .... ~..._._-. ........ IJ .... --.----·· - .. ,frol--. ·--
B The Computational Complexity of Phylo-
genetic Pa;,~simony Problems Incorporating
Explicit Graphs
Cousid<•t· t.h(• followiug dl'cisiou pruhlt·tu:
UNWEIGIITEil BINAHY \VAGNim Pt\HSIMONY \\'ITII GBAI'll (I' /1\\'r;)
Instance: Positive iutc•gc•r cl: graph (,' = ( \', 8), whc·rc· F = {0. I }'1 iltlcl /•,' = {{u,t,}: u,t' E \1 and u aud 1• diffc·t· i11 c•xactl.v cuw posit.iou}; a suhsC'I. ·"''of {0, I }'1; and a positive• int.c•gc•r /J.
Question: Is t.ltc·n• a phylugc•ny sat.isfyiug t.hc• \Vagtt<'l' phylu~c·twt.ic parsitttouy crileriou t.hat iuducl<•s S mul has lc•up;t.lt at. most. IJ'!
This problem differs from prohlm1 UBW ch•filll'cl itt Sc•c·l.iou a.:U hy itwlucling tlw
fi-dimeusional graph explicitly in it.s iustaun•. Both of t.hc•sc• prohlc•ms arc• iu Ni';
however, UHW has been showu NP-romplete [D.ISH(i, (:Fx~}, and the· c·omplc•xit.y
of U BWa is unknown. The complexity of{/ IJWr; is of iut.c•rc•sl. nul. ouly lwl'allsc•
it has been used in proofs of N P-com pleteness [DayH:J], hut. also llC'ratlsc• i I. Wo11lcl
be interesting to know by exactly how much t.lw cxpotaenl.ial paclcliug of t.lw iuptrl.
instance with G reduces the complexit.y of UBW.
To this end, consider the following rcstrict.ions 011 a phylogmH~t.ic: pa.rsir11cmy
problem n: let no(poly) he the subproblem 11 rest.ridefl l.o inst.anc:c~s suc:lt that.
lSI ::; p(tl) for some. polynomialp, anclll0 (ex7•) he Lite slt!Jprohletn of II rest.rktc!d
to instances such that c2J $ lSI for some constant c, c > 0. The former rest.rkt.icm
lli~ldi~!Jt.s, awl t.lu• lat.tn isolal.t!s, t.hc• r·omplexity iut.rorlucPcl by padding. If thP
r·onJplt•xit.y of 11/JWr: r·<LII!Jol. he dt•h•rnJiiJ(•d rlin•rt.ly, t.hc•:w r<'strictc•cl subproblems
lllii.Y still ,f.!;iVt' loW('J' houucls.a
( :ousid('r tlw t'OIIIpl('xit.i('s ,,f I I BW aud I! BWr;. As nwutioncd already, UB\V
is N l'·l'olllpl('t.<·. If I 1/lWr; is Nl'-l'olllplt~t.c•, then IJBW ~:;1 lf BJVc;; sur.h a redur-
tiou is diflir'tJII. to visualiz<·, lH'CilliS<' it. impli<•s that a problem on dimPusion d ran
lw rllapp<•d onto <trl c•quivalc·nf. problem of dimension O(log d). Alternatively, the
padding iut.nulucc•1l J,y G 111ight yi<'l<l polyuomial algorithms via algorithms that
solve• t.h<· pn,l,l<'lll STEINER THEE IN GRAPHS (s<~c Section 3.2.1). However,
all know11 STG algOI'it.hms, iucludiug lhosP rc>slricled to d-clirncnsional graphs,
al't· lilwar iu 1(,'1 and PXponPnt.ial in IS'I [Sny92, Win87).
( !onsid<•r now t.IH' rompl.-xit.ic•:; of (! BW0 (f'rp) and U Bwg<f'zl•l. These proh-
l«'ms an• ,·umput.at.ioually equivak•nt i.e. ll BW2(r-rp) $f,1
U BW0 (f'rr•) (discard G),
a till I I JJWil(r.r,,) :::;~1 U IJW~(rr,,) (add G, which can he constructed in time lin-
<'ar i11 tlu• siJ'.<' of 1111 iustance of{! BH!O(rrJ•l). Both of these problems reduce to
(//JIVr;; hoW<'V<'r, for reasons similar to those given above, it is not obvious that
t.lwy an• Pit.ll<'r NP-completc or in P.
'l'h<• rompl<•xity of the third pair of problems, U BW0 (voly) and U Bwg<voly),
is til<' most. ;nten•sting. The reduction from VERTEX COVER given in [DJS86)
asinn• this t.ht•sis wn.o; submit.t<'d t.o the referees, I have found out that the weighted version of U HU'<; (art.ually, t.ht• weighted version of U BW~(ezp)) has been shown to be NP-complete [<hl!;!)l, St•rlion 6J . While this does not immediately affect the problems examined in this st•rt.ion, it. may ht• n stimulus for furt.hrr rrscarrh.
174
l i !·
.. ·-
---
Theorem 58 u BW0 (J•oly) <1' [f nn:(~(J•oly} ;r (lfl(l olllrJ if J> = N/1 -m (, J , '
Proof: The impliration from right. to ll'ft. is l.t'i\'ial. Tlw illlplil'al.ion from ll'l't
to right follows hy this ronstruct.ion: A rc·<luct.ion fwm an inst aun• of I' 1/1 rr>l,···1"l
to (I /JW;Jb•oly) must map it polynomiallllllllht•r of \'t't't.in•s in a p;raph of tlinu•u-
sion d into a polynomial numht•r uf \'<'rt.in•s in et p;raph uf lup;arithmically lm\'t'l'
also an instanre of ll J3W 0 (Jmly) . Hc'{l<'rtf. t.ltis prot'('SS a polyuumial lllllllln•r or
times to produce an inst.ann.• of dinwnsiun 0( I), wiJich n111 lw sol\'c•d in <'ollsl.clllt.
time. This yields a polynomial Hlgorit.lun for II /JW0 h'"1UI , which implit•s t.hat. P
= NP. I
Corollary 59 If P-:/:- NP then U JJW{j11101111 i.~ no/. Nf.J·f'OliiJildt·
All optimal solutions l.o instances of I! IJW{!1:'"
11' 1 an• of sir.<• polytlolllia.l in tl (s<'<'
Section 3.2.1 ), and hcucc~ of size polylogari t. lunic iu t.ltc~ iu:-;I.IIJU'(' of(/ nw::(l·,fu).
Thus, U Bwg<raly) is iu /Jpolylou' the class of dedsiou prol•lmus n~quit·iup; uuly
polylogarithmic nondcterminism, which is prohahly st.rktly r.outaiu<•tl J,dW<!<!II
p (= .Biogn} and N~ (= uk~I (-Jflk) [DT90, p. ~~]. Morf!OV<~r, J,y tiH~ Dreyfu:-;
Wagner STG algorithm ([Sny92, Section 2} [Win87, Section 1 .~}), If uwJ:<rmtyl is
in O(nO(Iogn)).
175
TlwrP is f!Vf'll drr:um:;tautial evideuce that. U BWf}(poly) is ill P. Au Pllr.mling
of;, ~raplr (J = ( V, E) is s1U:dnd if it is of siz<.~ polyloga.rit hmir in lVI [GW83J; an
I!X;unplt! of Sllf'h au f'llr:otliug is a polylogarithmically sized circuit that computes
tlw adjaf'f'llf'j lllat.rix for r:. Though {/ IJWO(poly) does not explicitly incorporate
au f'IH'otliug of r:, its prohl<mr inst.aures will always be of size polylogarithmic
iu r:; h!•tu'f', II HW0 '""1Y) ran h1· r.ousidr.recl as the succinctly encoded version of
II nw::h•oluJ. Ju gf•twral (cf. [LW!J~]), succinct cncodings precisely exponentiate
tlrf' t.iJJII' t·umpiPxit.y of graph problems e.g. the succinct version of t! • .J trivial
graph JII'OPI'rt.y r.ri.<~lrufr of a lrianglr is NP-hard [GW83, Theorem 2. I], and
the• surcind. vc•rsiun of t.lw NP-complde problem :3-COLORABILITY is NEXP-
<'Olllpl<•t.•• [PY~H, Corollary). If a problem fl is P-hard via a certain type of
n•dud.ion f'allc•d a ·. >()jrrlion from lht• Circuit Value Problem, then the succinct
t'lll'ucliug wrsion of ll is EXP-hard [PY86, p. 184); if, in turn, the succinct
t'II<"U<liug vt•t·siou of II is NP-complete, t.hen P # NP.
Corollary 60 If lf 13Wf:(lwly) is P-hard via a projection from the Circuit Value
Many rlassknl polynomial-time reductions can be easily made into projections
[PY~(1, p. I S2]; this may also be true of the log-time reductions used to establisl.
P-hanlness. As P =F NP probably cannot be proved in our standard system of
lo,~ir [G.Ji!J, p. 186], it i~ unlikely that U B~Vg(poly) can be proved to be P-hard,
and likc•ly that it. is in P.
Ii6
.,.. ____ .,._. ........ ._.. ....... - ....... -~---------,--___.,._._..._____ -.w ............. ..,.......,. ........... ~-- ~ -
U BWO(poly) NP-C
l1131F
~---------------------+-----
U13Wu ---------~
U Bwg(voly) : NP 1
p : NP
Figure 13: Reductions among implicit and c~xplkit. gmph Uuwdght.c•cl Binary Wagner parsimony decision problems. Reductious II ::;!;
1 II' an• dc•tuJI.c•d hy arrow:-~
from n ton'. The abbreviations NP-C and NPI stand fOI' the daSSl'S N I'·C'OIIlplc•t.c• and NP-intermediate (= NP- (P U NP-C)}, respediwly.
The known relations among problems examiucd iu this sPction art~ sun11na.-
rized in Figure 13. { conjecture that. U BW{;(Iwly) is in P and t.hat. { f/JW'1(u·p),
U BWg(Pxp), and U BWa arc all strictly contained hdwc•cm P and N P-c·ompld.c•.
To my knowledge, u awg(po/y) and u BWO(po/y) are f.Jw only prohlc~JII·pil.il' sudl
that the complexity of the succinct encoding version is known lmt. f.lw wruplc~xif.y
of the full graph version is unknown. This iu itself malws !.ht•lfl cr.ndidal.c!s for
further research.
177
top related