the effect of model choice on phylogenetic inference using mitochondrial sequence data: lessons from...

13
Molecular Phylogenetics and Evolution 43 (2007) 583–595 www.elsevier.com/locate/ympev 1055-7903/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2006.11.017 The eVect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions Martin Jones a,¤ , Benjamin Gantenbein b , Victor Fet c , Mark Blaxter a a Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK b AO Research Institute, Clavadelerstrasse 8, Davos Platz CH-7270, Switzerland c Department of Biological Sciences, Marshall University, Huntington, WV 25755-2510, USA Received 25 April 2006; revised 14 November 2006; accepted 14 November 2006 Available online 29 November 2006 Abstract Chelicerates are a diverse group of arthropods, with around 65,000 described species occupying a wide range of habitats. Many phy- logenies describing the relationships between the various chelicerate orders have been proposed. While some relationships are widely accepted, others remain contentious. To increase the taxonomic sampling of species available for phylogenetic study based on mitochon- drial genomes we produced the nearly complete sequence of the mitochondrial genome of the scorpion Mesobuthus gibbosus. Mitochon- drial gene order in M. gibbosus largely mirrors that in Limulus polyphemus but tRNA secondary structures are truncated. A recent analysis argued that independent reversal of mitochondrial genome strand-bias in several groups of arthropods, including spiders and scorpions, could compromise phylogenetic reconstruction and proposed an evolutionary model that excludes mutational events caused by strand-bias (Neutral Transitions Excluded, NTE). An arthropod dataset of six mitochondrial genes, when analyzed under NTE, yields strong support for scorpions as sister taxon to the rest of Chelicerata. We investigated the robustness of this result by exploring the eVect of adding additional chelicerate genes and taxa and comparing the phylogenies obtained under diVerent models. We Wnd evidence that (1) placement of scorpions arising at the base of the Chelicerata is an artifact of model mis-speciWcation and scorpions are strongly sup- ported as basal arachnids and (2) an expanded chelicerate dataset Wnds support for several proposed interordinal relationships (ticks plus mites [Acari] and spiders plus whip spiders plus whip scorpions [Araneae + Pedipalpi]). Mitochondrial sequence data are subject to sys- tematic bias that is positively misleading for evolutionary inference and thus extreme methodological care must be taken when using them to infer phylogenies. © 2006 Elsevier Inc. All rights reserved. Keywords: Phylongeny; Scorpiones; Chelicerata; Mitochondrion; Strand-bias; NTE 1. Introduction 1.1. Chelicerates The chelicerates (subphylum Chelicerata) are a diverse group of arthropods including many organisms of medical (Ixodes scapularis, a Lyme disease vector), economic (Rhipi- cephalus [formerly Boophilus, see Barker and Murrell, 2004] microplus, a cattle tick) and evolutionary (Xiphosura, horse- shoe crabs) interest. Sea spiders (class: Pycnogonida) are sometimes considered chelicerates but several alternative hypotheses have not been ruled out (Dunlop and Arango, 2005). The phylogenetics of chelicerates has been studied at many levels, from the species (Dobson and Barker, 1999) to the phylum (Mallatt et al., 2004). Major problems are the relationships between the various chelicerate orders, for which many solutions have been proposed but on which a consensus is yet to be reached (Weygoldt, 1998). In particu- lar, the phylogenetic position of the Scorpiones, a key order in the arachnid phylogeny, is highly disputed (Weygoldt, 1998; Dunlop and Braddy, 2001; Dunlop and Webster, 1999; Giribet et al., 2002; Wheeler and Hayashi, 1998). Scorpions * Corresponding author. Fax: +44 131 650 7489. E-mail address: [email protected] (M. Jones).

Upload: martin-jones

Post on 02-Jul-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

Molecular Phylogenetics and Evolution 43 (2007) 583–595www.elsevier.com/locate/ympev

The eVect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

Martin Jones a,¤, Benjamin Gantenbein b, Victor Fet c, Mark Blaxter a

a Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UKb AO Research Institute, Clavadelerstrasse 8, Davos Platz CH-7270, Switzerland

c Department of Biological Sciences, Marshall University, Huntington, WV 25755-2510, USA

Received 25 April 2006; revised 14 November 2006; accepted 14 November 2006Available online 29 November 2006

Abstract

Chelicerates are a diverse group of arthropods, with around 65,000 described species occupying a wide range of habitats. Many phy-logenies describing the relationships between the various chelicerate orders have been proposed. While some relationships are widelyaccepted, others remain contentious. To increase the taxonomic sampling of species available for phylogenetic study based on mitochon-drial genomes we produced the nearly complete sequence of the mitochondrial genome of the scorpion Mesobuthus gibbosus. Mitochon-drial gene order in M. gibbosus largely mirrors that in Limulus polyphemus but tRNA secondary structures are truncated. A recentanalysis argued that independent reversal of mitochondrial genome strand-bias in several groups of arthropods, including spiders andscorpions, could compromise phylogenetic reconstruction and proposed an evolutionary model that excludes mutational events causedby strand-bias (Neutral Transitions Excluded, NTE). An arthropod dataset of six mitochondrial genes, when analyzed under NTE, yieldsstrong support for scorpions as sister taxon to the rest of Chelicerata. We investigated the robustness of this result by exploring the eVectof adding additional chelicerate genes and taxa and comparing the phylogenies obtained under diVerent models. We Wnd evidence that (1)placement of scorpions arising at the base of the Chelicerata is an artifact of model mis-speciWcation and scorpions are strongly sup-ported as basal arachnids and (2) an expanded chelicerate dataset Wnds support for several proposed interordinal relationships (ticks plusmites [Acari] and spiders plus whip spiders plus whip scorpions [Araneae + Pedipalpi]). Mitochondrial sequence data are subject to sys-tematic bias that is positively misleading for evolutionary inference and thus extreme methodological care must be taken when using themto infer phylogenies.© 2006 Elsevier Inc. All rights reserved.

Keywords: Phylongeny; Scorpiones; Chelicerata; Mitochondrion; Strand-bias; NTE

1. Introduction

1.1. Chelicerates

The chelicerates (subphylum Chelicerata) are a diversegroup of arthropods including many organisms of medical(Ixodes scapularis, a Lyme disease vector), economic (Rhipi-cephalus [formerly Boophilus, see Barker and Murrell, 2004]microplus, a cattle tick) and evolutionary (Xiphosura, horse-

* Corresponding author. Fax: +44 131 650 7489.E-mail address: [email protected] (M. Jones).

1055-7903/$ - see front matter © 2006 Elsevier Inc. All rights reserved.doi:10.1016/j.ympev.2006.11.017

shoe crabs) interest. Sea spiders (class: Pycnogonida) aresometimes considered chelicerates but several alternativehypotheses have not been ruled out (Dunlop and Arango,2005). The phylogenetics of chelicerates has been studied atmany levels, from the species (Dobson and Barker, 1999) tothe phylum (Mallatt et al., 2004). Major problems are therelationships between the various chelicerate orders, forwhich many solutions have been proposed but on which aconsensus is yet to be reached (Weygoldt, 1998). In particu-lar, the phylogenetic position of the Scorpiones, a key orderin the arachnid phylogeny, is highly disputed (Weygoldt,1998; Dunlop and Braddy, 2001; Dunlop and Webster, 1999;Giribet et al., 2002; Wheeler and Hayashi, 1998). Scorpions

Page 2: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

584 M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595

have been variously proposed to be basal (Hypothesis 1,Fig. 1a) or derived (Hypothesis 3, Fig. 1c) arachnids(reviewed in Wheeler and Hayashi, 1998). Although scorpi-ons are a key taxon for the understanding of chelicerate evo-lution, current sequence datasets are limited in extent. Inparticular, scorpion mitochondrial genomes have not beenfully exploited (Gantenbein and Largiadèr, 2003). Fourteencomplete mitochondrial genomes from Acari (ticks andmites), three from spiders, and one from a New World scor-pion (Centruroides limpidus: Buthidae) have been sequenced.Here, we describe the mitochondrial genome of the EastMediterranean scorpion Mesobuthus gibbosus (Buthidae)and use it to further explore chelicerate relationships.

1.2. Multigene datasets

Hypotheses regarding the relationships between cheli-cerate orders have been largely based on morphology(Weygoldt and Paulus, 1979; Giribet et al., 2002; Wheelerand Hayashi, 1998; Regier and Shultz, 2001). Nuclear ribo-somal RNA genes have been used for wider arthropod andchelicerate phylogeny (Mallatt and Giribet, 2006; Wheelerand Hayashi, 1998). Here, we examine the utility of mito-chondrial sequence data for addressing questions of cheli-cerate phylogeny. Recent data suggest that large datasets,

Fig. 1. Summary trees showing various phylogenetic hypotheses regardingthe placement of Scorpiones. Scorpiones are shown (a) as sister taxon toother arachnids (b) as sister taxon to other chelicerates (c) as derivedarachnids forming a sister taxon to Araneae.

comprising many genes, can resolve problematic phyloge-nies with a high degree of conWdence (Philippe et al., 2005a;Rokas et al., 2003). By combining the information frommultiple gene sequences, clades can be recovered that arenot recovered under analysis of any of the individual genes(‘hidden support’, described in de Queiroz and Gatesy,2006). DiVerent genes with diVerent evolutionary rates maygive strong phylogenetic signals at diVerent depths in a phy-logenetic tree (Giribet, 2002). Thus, by including multiplegenes in a phylogenetic study, one could obtain a tree thatany single gene would be unable to resolve, as advocatedearlier by the total evidence proponents (e.g., Kluge, 1989;Nixon and Carpenter, 1996). The use of multiple genes forphylogenetics comes with its own set of diYculties, the mostsigniWcant of which are computational complexity, and theneed for evolutionary models that describe the variationbetween genes (Pupko et al., 2002). As the number of genes(and hence the number of characters) included in a multiplesequence alignment grows, so does the time required toevaluate the likelihood or parsimony score of a correspond-ing phylogenetic tree and hence the time required to executetree search algorithms. The choice of evolutionary model,always a critical issue in phylogenetic reconstruction (Lem-mon and Moriarty, 2004), is particularly important wheremultiple genes are involved. If the genes evolve under diVer-ent evolutionary constraints, a single model of DNA evolu-tion may not accurately describe the history of allcharacters in the alignment, and separate models andparameters may have to be assigned to each gene. Addi-tionally, if some gene sequences are unavailable for sometaxa, the alignment may have an appreciable proportion ofmissing data which may adversely aVect the robustness ofthe tree (Wiens, 2003).

1.3. Strand-bias

In a single gene phylogenetic study involving few charac-ters, the accuracy of the tree is limited by the amount ofphylogenetic signal present. In contrast, in a multigenestudy, the accuracy of the tree is more likely to be limitedby systematic errors. Phenomena such as diVering substitu-tion rates between lineages (Brinkmann et al., 2005; Felsen-stein, 1978), diVering patterns of rates between lineages(Gadagkar and Kumar, 2005; Philippe et al., 2005b) andcompositional bias (Galtier and Gouy, 1995) have all beenfound to lead to tree reconstruction artifacts. Model choicehas been shown to be a key factor in overcoming diYcultiesassociated with analysis of biased sequences (Lemmon andMoriarty, 2004; Posada and Buckley, 2004; Sullivan andSwoVord, 2001). Additionally, diVering patterns of evolu-tion between genes can cause problems in phylogeneticreconstruction under models that fail to take inter-genediVerences into account (Nylander et al., 2004).

All sequences in non-recombining animal mitochondrialgenomes are tightly linked and, in some respects, behave asa single gene. It might therefore be supposed that mito-chondrial gene sets would be aVected by biases to a similar

Page 3: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595 585

degree, and hence show fewer diVerences between genes.Recently, strand-bias in mitochondrial genomes has beenshown to have a misleading eVect on phylogenetic recon-struction (Hassanin, 2006; Hassanin et al., 2005). In typicalmetazoan mitochondrial genomes one strand shows a com-positional bias towards A and C nucleotides (positive ATand CG skew). This has been proposed to be caused by theasymmetry of the replication and transcription processes.In some arthropod species, comparisons of orthologousgenes reveal a reversed bias (towards T and G nucleotides),which is believed to be driven by inversion of the mitochon-drial control region (Reyes et al., 1998; Tanaka and Ozawa,1994). In phylogenetic studies, some taxa with reversestrand-bias have been shown to cluster together artefactu-ally when analyzed under standard evolutionary models(Hassanin, 2006). Hassanin et al. speciWed a novel recodingscheme (Neutral Transitions Excluded, NTE) which, byexcluding neutral or quasi-neutral transitions, eliminateschanges driven by both normal and reverse strand-bias.When analyzed under the NTE model, a dataset of sixmitochondrial protein-coding genes for 71 arthropod spe-cies yielded a phylogeny with Chelicerata monophyleticand Scorpiones as sister taxon to all other chelicerates(Hassanin, 2006, Hypothesis 2, Fig. 1). To investigate therobustness of this result, and the utility of the NTE modelfor chelicerate phylogeny using mitochondrial data, weused a bioinformatics pipeline to mine publicly availablesequence data and assemble a dataset of aligned cheliceratemitochondrial genes, including species for which only a fewmitochondrial gene sequences were available. Bayesianphylogenetic analysis was performed on subsets of thesesequences using a variety of evolutionary models.

2. Materials and methods

2.1. Mesobuthus mitochondrion

The sequenced mtDNA is of a specimen of M. gibbosuscollected by V.F. from Visitsa, Mt. Pilion, Eastern Thessaly,Greece. Fresh DNA was extracted using the Puregeneextraction kit (Gentra). About 200–400 ng total genomicDNA was used for 50 �l PCR. Amplicons were checked forsingle bands on agarose gels, and subsequently puriWed aspreviously described (Gantenbein and Largiadèr, 2003).The three overlapping fragments were sequenced with theprimer walking technique. PCR primers and internalsequencing primers are given in Supplementary Material.AmpliWcation was performed using the expand-long tem-plate PCR system from Roche and the following cyclingproWle: 94 °C for 2 min, 10 cycles (94 °C for 15 s, 55 °C for1 min, 68 °C for 4 min), and 20 cycles (94 °C for 15 s, 54 °Cfor 1 min, 68 °C for 4 min + 15 s every cycle), Wnal extensionat 68 °C for 15 min. PCR products were subcloned into theplasmid vector pCR2.1 using the TOPO Cloning Kit (Invit-rogen). At least four positive clones per transformationwere picked and sequenced with M13 and internal sequenc-ing primers using the primer walking technique.

Sequencing reactions were carried out using Big Dyeaccording to the manufacturer’s protocol and were ana-lyzed on an automated sequencer (either ABI377XL orABI3730). Only sequences with <5% ambiguities were con-sidered for analysis. A total of 96 sequences were aligned toproduce the Wnal contig (51 sequences on the forwardstrand and 45 on the reverse strand). Five percent of thebases were sequenced only once and the average sequencecoverage was »5-fold.

A consensus sequence was built using Seqman from theDnastar package (using a minimum sequence identity of80%) and imported into JellyWsh for annotation. The pro-tein-coding genes were identiWed by the presence of an openreading frame and by similarity of inferred amino acidsequences to those of other complete chelicerate mitochon-drial genomes available from public databases. Abbrevi-ated stop codons of “T”-type (which are completed bypost-transcriptional polyadenylation) were assumed if theymatched with the boundary of a downstream gene or werefound less that 20 bp before the adjacent gene.

The boundaries of the small ribosomal (12S rRNA) andthe large ribosomal (16S rRNA) RNA genes were deter-mined by prediction of the secondary RNA structure usingthe Vienna RNAserver (Hofacker, 2003) and subsequentcomparison with other arthropods (Wuyts et al., 2004).Additionally, a multiple sequence alignment of the scorpionmitochondrial genome with that of 13 other chelicerateswas performed to further conWrm the gene boundaries(Mallatt et al., 2004; Masta and Boore, 2004).

For the identiWcation of tRNAs the program tRNA-scan-SE 1.21 (Lowe and Eddy, 1997) was used, followingthe protocol of Masta and Boore (2004) and relaxingsearch parameters (Cove cut-oV to >0.1). The N-tRNAgene was manually annotated by Wnding instances of theappropriate codon and aligning the surrounding sequenceto N-tRNA from the scorpion Centruroides limpidus. tRNAstructures were drawn using the graphic view option imple-mented in tRNAscan-SE. For gene abbreviations we fol-lowed Masta and Boore (2004).

2.2. Bioinformatics

A bioinformatics pipeline (Jones and Blaxter, 2006;available from http://www.nematodes.org/bioinformatics/),utilising Perl, BioPerl (Stajich et al., 2002) and a number ofother tools, was used to carry out the processes of sequenceextraction, consensus building, alignment, taxon selectionand phylogenetic reconstruction. At each stage of the anal-ysis, output was stored in a relational database for retrievalin the next stage.

2.3. Sequence extraction

All GenBank records for chelicerate species wereobtained through NCBI ENTREZ (http://www.ncbi.nlm.nih.gov/entrez/, 23/6/2005) and the sequencesof interest (corresponding to the genes ATP6, ATP8,

Page 4: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

586 M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595

COX1, COX2, COX3, CYTB, ND1, ND2, ND3, ND4,ND4L, ND5, ND6, RNA_12S, RNA_16S, RNA_SSU, andRNA_LSU) extracted based on annotation. Sequences ofinterest were also obtained for the following outgrouparthropods: Drosophila melanogaster (Hexapoda, Diptera),Triatoma dimidiata (Hexapoda, Heteroptera), Triops cancr-iformis (Crustacea, Notostraca), Daphnia pulex (Crustacea,Cladocera), Scutigera coleoptrata (Myriapoda, Scutigero-morpha). EST sequences (those derived from expressedsequence tags, e.g., Kenyon et al., 2003) were processed sep-arately: a BLAST (Altschul et al., 1997) database wasassembled containing complete translated proteinsequences for the protein-coding genes of interest from arange of chelicerate species, and Wltered at 80% redundancyusing BLASTCLUS. To screen EST sequences for genes ofinterest, each EST was used as the query sequence in aBLASTX search of this database. If the EST sequence hada hit in the database with an E-value of less than 1e¡8, itwas assigned the same gene name as that of the hit. A simi-lar screening protocol was used to identify mitochondrialribosomal RNA (rRNA) genes, using a database of com-plete rRNA gene sequences from a range of chelicerate spe-cies in a BLASTN search.

2.4. Consensus building

For each of the genes of interest, a consensus sequencewas built for each species using a rule-based approach.Sequences derived from fully sequenced genomes were pre-ferred over sequences extracted on the basis of annotation,which were, in turn, preferred over EST sequencesextracted on the basis of BLAST similarity to knownsequences. If multiple sequences of a single type were avail-able (e.g., there were two annotated sequences present for agene in a particular species), phrap (Gordon et al., 1998)was used to generate a consensus sequence, with the longestconsensus sequence being preferred if multiple consensussequences were generated. Consensus sequences that werevery short (<30 bases), contained long mononucleotide/dinucleotide runs (>12 bases) or produced incompletetranslations were removed.

2.5. Alignment

For each gene of interest, a multiple sequence alignmentwas generated using POA (Lee et al., 2002). In preliminaryanalyses the local alignment strategy used in POA provedto be a good solution for aligning sequences with missingdata. For protein-coding genes, the alignment was carriedout in two stages. Firstly, the translated protein sequencesfor all genome-derived and annotated sequences werealigned, then the nucleotide sequences aligned with TRAN-ALIGN from the EMBOSS package (Olson, 2002) usingthe protein alignment as a guide. Secondly, any EST-derived sequences were aligned to this nucleotide alignmentusing the POA proWle alignment option. Aligned DNAsequences were stored in the database. Complete DNA

alignments for each individual gene were analyzed usingModeltest (Posada and Crandall, 1998). The model selectedby Akaike information criterion (AIC) was, in each case,either General Time Reversible (GTR) or Tamura–Nei(TrN), with gamma rate variation between sites (G) and aproportion of invariant sites (I) selected in all cases.

2.6. CG skew

For each gene, third codon position bases were extractedfrom sequences from species in the D1 dataset. CG skewwas calculated from 5� to 3� in each gene for those basesusing the following formula: where C and G represent theproportion of C and G nucleotides, respectively. Skew wascalculated both before and after NTE recoding. Under theNTE recoding scheme, all third codon position bases arerecoded as purine (R) or pyrimidine (Y). For NTE-recodedsequences, the proportion of C and G nucleotides was cal-culated as half the proportion of Y and R nucleotides,respectively.

2.7. Phylogenetic analysis

Multiple sequence alignments suitable for phylogeneticanalysis were generated by specifying a subset of genes anda subset of taxa and extracting the corresponding pre-aligned sequences from the database. Multiple sequencealignments are available from the corresponding author onrequest. The alignments were analyzed using MrBayes 3.1(Ronquist and Huelsenbeck, 2003). Several diVerentanalyses were carried out on the D1 and D2 datasets(Table 2). By default, a GTR + I + G model (nstD6; ratesDinvgamma) was applied to all alignments. For those modelswhere the NTE scheme was used, bases were recodedaccording to Hassanin, Leger and Deutsch (2005) and thirdcodon position bases were assigned to a separate partitionwith two substitution types (nstD 2). For those modelswhere the alignment was partitioned by gene base frequen-cies, substitution rates, alpha parameters and proportionsof invariant sites were unlinked across partitions, and a ratemultiplier was used to allow rate variation between parti-tions.

Analyses were carried out in MrBayes 3.1.2 using defaultMCMCMC parameters (two independent runs with onecold chain and three heated chains per run) for 1,000,000generations. Convergence of split frequencies and Xatteningof likelihood scores were inspected to ensure stationarityand convergence between runs and the Wrst 100,000 genera-tions (10%) discarded as burn-in. Trees sampled after theburn-in period were summarized to give the 50% majorityrule consensus tree. To calculate Bayes factor support forthe hypothesis corresponding to the optimal tree, analyseswere run using parameter estimates from the initial analysiswith the following taxa constrained to be monophyletic:Hypothesis 2—Ixodes hexagonus, Mastigoproctus gigan-teus, Limulus polyphemus, Argiope bruennichi, Phrynus sp.,Leptotrombidium pallidum, Heptathela hangzhouensis,

Page 5: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595 587

Cario capensis; Hypothesis 3—Argiope bruennichi, Hep-tathela hangzhouensis, Mesobuthus gibbosus, EuscorpiusXavicaudis.

3. Results

3.1. Genome annotation and tRNA scanning

We have sequenced nearly all of the mitochondrialgenome of M. gibbosus (15,681 bp). The sequence has beendeposited in the EMBL sequence database under accessionAJ716204. The genome is AT-rich, having a GC content of31.6%. About 300 bp of the M. gibbosus mitochondrialgenome were not sequenced due to highly repetitive andAT-rich DNA motifs in the putative D-loop (controlregion). Most protein-coding genes in the M. gibbosus mito-chondrial genome are initiated with the start codon ATG.However, alternative initiation codons ATN are used forthe genes atp8, nad1, nad2, nad4, nad5 and nad6. For termi-nation, both stop codons (TAA, TAG) and the incompletestop codon “T” (coxII, coxIII, nad3, nad4, nad5 and nad6)are used. Initial analysis of the genome sequence usingtRNAscan (Lowe and Eddy, 1997) with default parametersidentiWed eleven tRNA genes (A, F, I, K, M, R, T, S1, P, Wand Y). Using the variant model for nematode tRNA genesand reducing stringency settings identiWed 10 additionaltRNA genes. The tRNA-N gene could only be identiWedmanually. Four of the tRNA genes conspicuously lack aT�C arm (a phenomenon previously described in spiders(Masta and Boore, 2004) and nematodes (Wolstenholmeet al., 1987) and Wve had extremely short aminoacyl recep-tor stems with atypical base-pairing (see SupplementaryMaterials for predicted structures of all M. gibbosus mito-chondrial tRNAs).

Mitochondrial genome gene order has been used as aphylogenetic marker (Boore and Brown, 1998, 2000). Thegene order of the M. gibbosus mitochondrial genome (seeSupplementary Material for a map of the genome) is identi-cal to that of the recently sequenced New World buthidscorpion, Centruroides limpidus (Dávila et al., 2005). Geneorder is more conserved between M. gibbosus and thehorseshoe crab Limulus polyphemus (three tRNA gene rear-rangements) than between L. polyphemus and the spiderHabronattus oregonensis (eight tRNA gene rearrange-ments).

3.2. Phylogenetic analysis

We examined a subset (D1) consisting of 12 protein-cod-ing genes for species from the chelicerate orders consideredin Hassanin (2006). We then examined a second subset (D2)containing sequences from additional chelicerate orders,which incorporated a signiWcant proportion of missingdata. In all analyses of D1 and D2, Pycnogonida was placedwithin the ticks and mites. Placement of Pycnogonida withAcari has been suggested previously (reviewed in Dunlopand Arango, 2005) but is unexpected, since morphological

evidence has been found to place pycnogonids as sistertaxon to all other chelicerates or to all other arthropods(e.g., Maxmen et al., 2005; see Giribet et al., 2005). Owing tothe unexpectedness of this result, and the relatively smallamount of sequence data available for Pycnogonida (»40%in D1, »30% in D2) we regard this placement as tentativeuntil conWrmed by future analyses and have thereforeignored Pycnogonida when describing relationships in thefollowing sections.

3.3. Dataset D1: mostly complete mitochondrial genomes

The D1 dataset consisted of 12 protein-coding genes for11 chelicerate species and Wve outgroup species (Table 1).To keep the analysis computationally tractable, we usedonly a subset of the taxa for which mitochondrial genomesare available, selecting taxa to obtain the widest taxonomiccoverage. In preliminary analyses, all orders were found tobe monophyletic (with the exception of Acari because Var-roa destructor behaves erratically under phylogenetic analy-sis due to its extremely high AT content [Navajas et al.,2002]) and inclusion of multiple representatives of eachorder did not alter the results. In order to investigate theeVect of model, recoding scheme and codon positionweighting on the phylogeny, six analyses were carriedwhich are given in Table 2, along with the summary treessupported under each model (Fig. 2). The analyses diVer in(1) the inclusion or exclusion of the third bases of codons(2) the use of the NTE recoding scheme and (3) permittingeach gene to have diVerent model parameters (partitioning).

DiVerent phylogenies were obtained from Bayesian anal-ysis of the D1 dataset under the diVerent analyses (Fig. 2;hypotheses in Fig. 1). The three analyses in which thirdcodon position bases were excluded yielded similar trees(Hypothesis 2) with support for scorpions as sister taxon toother chelicerates (rendering Arachnida paraphyletic). Theanalysis using the GTR + I + G model with partitioning(Fig. 2a) gave a tree with the remaining arachnids dividedinto two clades; one of spiders and related orders((Araneae + Uropygi) + Amblypygi) and one of mites andticks (Ixodida + Trombidiformes). Using the NTE recodingscheme (Fig. 2b), an identical tree was recovered with theexception of rearrangements within the two major arachnidclades. In the analysis using the NTE recoding scheme withpartitioning (Fig. 2c), the tree was identical to that obtainedunder the NTE analysis except for the relationships withinthe clade of mites and ticks, which were identical with thoseof the GTR partitioned tree, and the reduced support forscorpions as sister taxon to other chelicerates (75% com-pared to 95%).

When third codon position bases were included, eachanalysis recovered a diVerent tree. Under the GTR + 3panalysis (Fig. 2d) scorpions were no longer sister taxon toall other chelicerates. A clade of scorpions and the spiderArgiope bruennichi was recovered, with a(Uropygi + Heptathela hangzhousensis) clade as a sistergroup (summary tree c with the exception of paraphyletic

Page 6: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

588 M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595

g The hypothesis (see Section 1 and Fig.1) supported by each dataset under

the analysis.

Table 1List of taxa included in datasets D1 and D2

a The order to which each species belongs (NCBI taxonomy).b The Taxonomy ID (TXID) assigned to the species by the NCBI GenBank database.c The Ref. Seq. ID (where applicable) for the complete mitochondrial genome sequence.d The citation (where applicable) for the complete mitochondrial genome sequence.e Dataset Dl includes the following genes: ATP6, C0X1, C0X2, C0X3, CYTB, ND1, ND2, ND3, ND4, ND4L, ND5, ND6.f Dataset D2 includes the following genes: ATP6, ATP8, C0X1, C0X2, C0X3, CYTB, ND1, ND2, ND3, ND4, ND4L, ND5,ND6, RNA_12S,

RNA_16S.g No. chars.: the number of aligned characters in each species.h % present: the proportion of bases in the complete alignment present in each species.i Outgroup.

Taxon Ordera NCBI TXIDb Ref. Seq. IDc Citationd Dl (12,669 characters)e D2 (15,950 characters)f

No. chars.g % Presenth No. chars.g % Presenth

UnidentiWed Opilioacarid Opilioacarida 150113 — — — — 257 1.61%UnidentiWed Allothyrid Holothyrida 91335 — — — — 539 3.38%Steganacarus magnus Oribatida 52000 — — — — 387 2.43%Camisia horrida Oribatida 240610 — — — — 597 3.74%Sarcoptes scabei Astigmata 197185 — — — — 7394 46.36%Opilio parietinus Opiliones 121214 — — — — 478 3.00%Ixodes hexagonus Ixodida 34612 NC_002010 Black and Roehrdanz

(1998)11820 93.30% 14449 90.59%

Mastigoproctus giganteus Uropygi 58767 — — 3756 29.65% 4150 26.02%Limulus polyphemus Xiphosura 6850 NC_003057 Lavrov, Boore and

Brown (2000)12108 95.57% 14692 92.11%

Argiope bruennichi Araneae 94029 — — 4857 38.34% 4863 30.49%Phrynus sp. Amblypygi 309714 — — 5154 40.68% 5165 32.38%Endeis spinosa Pycnogonida 136194 — — 5019 39.62% 5030 31.54%Mesobuthus gibbosus Scorpiones 123226 NC_006515 This work 12090 95.43% 14331 89.85%Euscorpius Xavicaudis Scorpiones 100976 — — 4926 38.88% 6100 38.24%Leptotrombidium pallidum Trombidiformes 279272 NC_007177 Shao et al. (2005) 11487 90.67% 13776 86.37%Heptathela hangzhouensis Araneae 216259 NC_005924 Qiu et al. (2005) 11997 94.70% 14472 90.73%Carios capensis Ixodida 176285 NC_005291 — 11922 94.10% 14444 90.56%Scutigera coleoptratai Scutigeromorpha 29022 NC_005870 Negrisolo et al. (2004) 11979 94.55% 14496 90.89%Triops cancriformisi Notostraca 194544 NC_004465 Umetsu et al. (2002) 12063 95.21% 14613 91.62%Daphnia pulexi Diplostraca 6669 NC_000844 Crease (1999) 12135 95.78% 14727 92.33%Triatoma dimidiatai Hemiptera 72491 NC_002609 Dotson and Beard

(2001)12003 94.74% 14578 91.40%

Drosophila melanogasteri Diptera 7227 NC_001709 12123 95.69% 14723 92.31%

Table 2Details of models of sequence evolution used in phylogenetic analyses of the Dl and D2 datasets

a Name used to refer the analysis in the text. Codon positions 1 and 2 are included in all analyses. Third codon positions are only included in analysesthat contain +3. Analyses in which the dataset was partitioned by gene end in p.

b General Time Reversible (GTR) model applied to all bases (nstD 6 in MrBayes). All models include gamma rate variation and a proportion of invari-ant sites.

c Neutral Transitions Excluded (NTE) model; GTR applied to Wrst and second codon position bases (nstD 6) and two substitution type model appliedto third codon position bases (nst D 2). All models include gamma rate variation and a proportion of invariant sites.

d Bases are recoded according to the NTE scheme of Hassanin, 2005. All third codon position bases and a subset of Wrst and second codon positionbases are recoded as purine/pyrimidine (R/Y), eliminating phylogenetic signal caused by neutral and nearly-neutral transitions.

e The alignment is partitioned by gene. Each partition has independent base frequencies, transition rate matrix, gamma parameter and proportion ofinvariant sites. A rate multiplier is used to permit rate variation between partitions.

f First and second codon position bases are partitioned by gene and all third codon position bases form a separate partition. Each partition has indepen-dent base frequencies, transition rate matrix, gamma parameter and proportion of invariant sites. A rate multiplier is used to permit rate variation betweenpartitions.

Analysis namea Model of base change Codon positions Recoded Partitioned Dl treeg D2 treeg

GTRp GTR + G + F 1,2 n ye 4 4NTE NTE + G + Ic 1,2 yd n 4 —NTEp NTE + G + Ic 1,2 yd yf 4 —GTR + 3p GTR + G + Ib 1,2,3 n ye 3 —NTE + 3 NTE + G + Ic 1,2,3 yd n 4 —NTE + 3p NTE + G + Ic 1,2,3 yd yf 1/2 1/2

Page 7: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595 589

Fig. 2. Phylogeny reconstructed from the D1 dataset using Bayesian analysis under the following models: (a) GTRp (b) NTE (c) NTEp (d) GTR + 3p (e)NTE + 3 (f) NTE + 3p (see Table 2 for details of models). The order to which each species belongs is given in parentheses. The scale bar shows the branchlength associated with 0.200 expected changes per site. Labels on branches indicate the level of support: (¤), posterior probabilityD 1.00; (+), posteriorprobability D 0.9–0.99. Posterior probabilities of >0.9 are considered signiWcant; lower posterior probabilities are not shown.

Page 8: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

590 M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595

Araneae). A clade of ticks and mites was recovered. A cladeconsisting of Xiphosura + Amblypygi was sister taxon to allother chelicerates. Under the NTE + 3 analysis (Fig. 2e) thetree was identical to that found under NTE (summary treea) although with negligible support for scorpions as sistertaxon to all other chelicerates. Under the NTE + 3p analysis(Fig. 2f) Arachnida was monophyletic, with Scorpions a sis-ter taxon to the other arachnids, and Xiphosura as sistertaxon to the other chelicerates (Hypothesis 1). Two majorclades of arachnids were recovered; one of spiders andrelated orders (Araneae + (Uropygi + Amblypygi)) and oneof mites and ticks (Ixodida + Trombidiformes). Analysesunder NTE + 3p when the tree was constrained to supportHypothesis 2 or Hypothesis 3 yielded harmonic mean log-likelihood estimates of ¡81,004.98 and ¡80,996.41, respec-tively, compared with an estimate of ¡80,722.24 for theoptimal tree.

To test whether the data supported use of partitionedmodel, we estimated harmonic mean log-likelihoods fortwo pairs of analyses that varied only in the model; NTEversus NTEp and NTE + 3 versus NTE + 3p. In each casethe partitioned model was preferred: NTE:¡53885¿NTEp: ¡53193; NTE + 3: ¡82056¿NTE + 3p:¡80721.

When CG skew was examined at third codon positions,the species in the D1 dataset fell into two groups (Fig. 4).For the majority of species the mitochondrial genes had acharacteristic pattern of positive and negative skew. Threespecies (M. gibbosus, A. bruennichi, E. Xavicaudis) had apattern that was the reversed relative to other species in thegenes for which data was available. After NTE recoding, nosuch pattern was discernible.

3.4. Dataset D2: including single mitochondrial genes for some taxa

The D2 dataset consisted of thirteen protein-codinggenes and two rRNA genes for 17 chelicerate species andWve outgroup species (Table 1). The number and percentageof characters present varied between taxa, ranging from 257(1.61%) characters present for an unidentiWed Opilioacarid(Opilioacaridae) to 14723 (92.31%) characters present forD. melanogaster (outgroup). Taxa with complete mitochon-drial genomes, such as D. melanogaster had »8% missingcharacters due to minor diVerences in sequence lengthbetween taxa, resulting in end gaps. We included specieswith a range of numbers of characters present to investigatethe eVects of missing data on the phylogenetic placement of‘neglected’ taxa.

Under the NTE+3p analysis (Fig. 3), Xiphosura wasrobustly placed arising as sister taxon to the remaining cheli-cerates, with Scorpiones a sister taxon to the remaining arach-nids (Hypothesis 1) excluding Pycnogonida (see above).Relationships within the arachnids were generally poorlyresolved, although some groups were strongly supported:(Ixodida+Holothyrida) and (Uropygi+Amblypygi). In theGTRp analysis Hypothesis 2 was recovered (not shown).

4. Discussion

4.1. Mesobuthus gibbosus mitochondrion

The M. gibbosus mitochondrial genome is »16 kb in sizeand encodes the usual metazoan complement of 13 proteingenes and two ribosomal RNA genes. The order of thesegenes is the same as observed in another scorpion, Centuro-ides limpidus (Dávila et al., 2005), and in the horseshoe crab,Limulus polyphemus (Lavrov et al., 2000). It has previouslybeen proposed that the gene organization observed inL. polyphemus is plesiomorphic for arthropods in general.The tRNA genes were diYcult to identify, due to reduc-tions in the T�C or DHU arms in 14 genes and reductionin the lengths of the aminoacyl acceptor stem regions.Reduced tRNAs (compared to the standard metazoanmodel) have been observed previously in other chelicerates,such as spiders (Masta and Boore, 2004; Qiu et al., 2005);millipedes (Lavrov et al., 2002), and nematodes (Ohtsukiet al., 1998; Watanabe et al., 1994). In Lithobius forWcatus, a

Fig. 3. Phylogeny reconstructed from the D2 dataset using Bayesian anal-ysis under the NTE + 3p model (see Table 2 for details of models). Theorder to which each species belongs is given in parentheses. The scale barshows the branch length associated with 0.200 expected changes per site.Labels on branches indicate the level of support: (¤), posteriorprobability D 1.00; (+), posterior probabilityD 0.9–0.99; and (�), poster-ior probabilityD 0.80–0.89. Posterior probabilities of >0.9 are consideredsigniWcant. Posterior probabilities of less than 0.80 are not shown.

Page 9: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595 591

centipede, RNA editing has been described that stabilizesreduced tRNA secondary structure (Lavrov et al., 2000),but whether this also occurs in chelicerates is unknown atpresent. The arrangement of tRNAs on the mitochondrialgenome shows four diVerences compared with L. polyphe-mus (possibly the result of three rearrangements of thegenes for tRNAs I and D individually and [P,T] as a pair).

4.2. Evolutionary models

The reversals of strand-bias documented by Hassanin(2006) clearly show how systematic error can limit the use-fulness of multigene phylogenetic analysis. Multiple inde-pendent reversals of strand-bias in mitochondrial genomesand mitochondrial genes cause erroneous clustering of dis-tantly related arthropod taxa when alignments are sub-jected to phylogenetic analysis under a standard GeneralTime Reversible (GTR) model. In their work, Hassanin andcolleagues found evidence that strand-bias is mainly gener-ated by transitions (Hassanin, 2006; Hassanin et al., 2005).They proposed a recoding scheme, Neutral TransitionsExcluded (NTE) which removes the eVect of strand-bias byrecoding bases at neutral and nearly-neutral positions aspurines and pyrimidines (R/Y coding). Here we have inves-tigated a wider range of analyses, examining the eVects ofGTR versus NTE, the eVects of inclusion or exclusion ofthird codon position bases, and the eVects of a partitionedversus an unpartitioned model.

A notable result is the contribution of phylogenetic sig-nal from third codon position bases. Phylogenetic analysesof dataset D1 in the three analyses that include only Wrstand second codon position bases yield similar trees withscorpions as sister taxon to all other chelicerates (Hypothe-sis 2). Support for scorpions as sister taxon to all otherchelicerates decreases under the NTEp analysis whichincludes a more realistic partitioned model, suggesting thatthis placement may be artifactual. In contrast, the threeanalyses that included third codon position bases gave treeswith radically diVerent placement of Scorpiones, and corre-spondingly diVerent hypotheses regarding chelicerate evo-lution. In the GTR + 3p analysis, inclusion of third codonposition bases yields in a clade of scorpions and Argiopebruennichi, a spider (Hypothesis 3). Hassanin (2006)showed that A. bruennichi and Euscorpius Xavicaudis bothhave a reversed pattern of third position CG skew relativeto other arthropods. Analysis of the D1 dataset reveals thatA. buennichi and M. gibbosus both show a reversed CGskew pattern in third codon position bases relative to otherspecies in D1 (Fig. 4) which is removed by NTE recoding,suggesting that the (Scorpiones + A. bruennichi) clade is anartifact of mitochondrial strand-bias. This proposition issupported by the results of the NTE + 3 analysis, in whichthe portion of the third codon position signal due to mito-chondrial strand-bias is excluded and scorpions again aresupported as sister taxon to the other chelicerates (Hypoth-esis 2). A Wnal striking result is the change in the tree whenthe partitioned analysis is used (NTE + 3p). Here, Xipho-

sura is identiWed as sister taxon to other chelicerates andArachnida is monophyletic (ignoring Pycnogonida, see Sec-tion 3), with scorpions the sister taxon to other arachnids(Hypothesis 1).

To test the signiWcance of support for the favoredhypothesis, we can use the harmonic mean log likelihood ofeach hypothesis, calculated from the MrBayes runs, to esti-mate Bayes Factors as described in Kass and Raftery(1995). The Bayes Factors for [Hypothesis 1 vs. Hypothesis 2]and [Hypothesis 1 vs. Hypothesis 3] were 565.48 and 548.34,respectively, indicating very strong support for Hypothesis1 (Bayes Factor values of >20 are considered to indicatestrong support). In the same way, we calculated Bayes Fac-tor support for a partitioned over a non-partitioned modelfor the NTE and NTE + 3 analyses. The Bayes Factors were1383.9 and 2670.62, respectively, indicating extremelystrong support for the partitioned model in both pairs ofanalyses.

We explain these results by suggesting that two potentialsources of bias are present in the D1 dataset. Firstly, thirdcodon position bases are strongly aVected by mitochondrialstrand-bias, an eVect minimized by using the NTE recodingscheme. Secondly, evolutionary model parameters varybetween genes, leading to erroneous results if an underpa-rameterized, unpartitioned model is used. The commonlyobserved inversions of mitochondrial regions will leadgenes to acquire a strand bias that is the opposite of that ofthe genome as a whole. It is notable that robust support forthe phylogenetic position of scorpions was only recoveredfrom mitochondrial DNA sequences under the appropriatemodel. These Wndings emphasize the importance of modelchoice in phylogenetic analysis, and speciWcally the impor-tance, when using multiple genes, of removing any sourcesof systematic bias and adequately allowing for diVerencesin evolutionary models between genes.

4.3. D2 dataset

Analysis of the D2 dataset using GTR + 3p andNTE + 3p analyses showed the same pattern as we observedfor the D1 dataset (Table 2): the analyses gave Hypothesis2 and Hypothesis 1, respectively. The poor resolution of thephylogeny recovered under the most realistic analysis,NTE + 3p (Fig. 3) can probably be attributed to the largeproportion of missing data, coupled with a loss of phyloge-netic signal due to the NTE recoding scheme. While NTEremoves erroneous phylogenetic signal due to strand-bias,it will also inevitably remove genuine phylogenetic signal.

In summary, our results endorse a traditional picture ofthe relationships between chelicerate orders, excludingPycnogonida. It is important to note that some orders, inparticular members of the proposed clade Dromopoda(Opiliones, Scorpiones, Solifugae) are unrepresented due tolack of sequence data. We cannot rule out the possibilitythat the addition of these taxa to the tree would change thephylogenetic conclusions. With these caveats, our Wndingssupport an uncontroversial view of chelicerate evolution,

Page 10: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

592 M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595

Fig. 4. Graphs showing CG skew (see Section 2 for details) for three representative taxa from the D1 dataset. Each graph shows the skew at third codonposition bases for the following genes in order: ATP6, COX1, COX2, COX3, CYTB, ND1, ND2, ND3, ND4, ND4L, ND5, ND6. Graphs on the right rep-resent sequences before NTE recoding; graphs on the left, after NTE recoding. Missing data are indicated by red X’s. E. Xavicaudis showed the same pat-tern as M. gibbosus and A. bruennichi; all other taxa showed the same pattern as I. hexagonus..

Page 11: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595 593

with Xiphosura a sister taxon to other chelicerates andArachnida monophyletic. Within the Arachnida, Scorpi-ones are sister taxon to the remaining arachnids which aredivided into two high-level clades: ticks plus mites (Acari)and spiders plus whip scorpions (Araneae + Pedipalpi).Under this scheme Scorpiones may occupy a key role in theunderstanding of arachnid diversity—again, in the absenceof all its putative sister groups according to most moderntreatments of chelicerate phylogeny (e.g., Shultz, 1990;Wheeler and Hayashi, 1998; Giribet et al., 2002). Morpho-logical, genomic and developmental characters that areshared between scorpions and other arachnids presumablyrepresent the ancestral arachnid state, allowing us to polar-ize such characters on the chelicerate phylogeny. However,the position of most of the controversial arachnids (e.g.,Palpigradi, Pseudoscorpiones, Solifugae, Opiliones andRicinulei) remain untested.

As has been shown in many previous analyses, largedatasets comprising multiple genes can be eVective atresolving phylogenies in situations where single genes areinsuYcient, provided care is taken to avoid systematic bias(Philippe et al., 2005a; Rokas et al., 2003). For multigenemitochondrial data, particular attention must be paid tonormal and reverse strand-bias and to parameter diVer-ences between genes. Taxa can also be robustly placedusing partial datasets. In D1, for example, A. bruennichi isrobustly placed despite having only 33% of characters pres-ent, and in D2, E. Xavicaudis is robustly placed using only38% of characters. It is likely that, in both of these cases, theconWdence with which the taxa are placed can be attributedto two factors; a large absolute number of characters (4857coding nucleotides in the case of A. bruennichi) and thepresence of a closely related species with a near-completealignment in the dataset. In general, species not fulWllingthese criteria are unlikely to be robustly placed by phyloge-netic analysis of partial datasets. The sole pycnogonid spe-cies, Endeis spinosa, had »40% of characters present in theD1 dataset, but had no closely related species in the align-ment and was consistently placed in a clade of mites. Moresequence data or a wider taxonomic sampling from the pyc-nogonids would be necessary to investigate this surprisingresult.

Given the large historical interest in the use of mitochon-drial genes as phylogenetic markers, particularly in thearthropods (Cameron et al., 2006; Cook et al., 2005; Nardiet al., 2003), and the tendency towards the use of large, mul-tiple gene datasets for phylogenetic reconstruction, theeVect of model choice is a pertinent issue for those inter-ested in molecular systematics. Strand-bias is one of severaltypes of compositional bias; indeed, it is likely that the com-position of most gene sequences is inXuenced by multiplebiasing factors. Our Wndings, though directly pertaining toonly a small set of taxa, suggest that mitochondrial genesequences may be actively misleading. Care should betaken, when using any multiple gene dataset, to avoidexplore the potential for systematic bias. In the case ofmitochondrial genes, strand-bias should be of particular

concern and the previous use of mitochondrial genomes inresolving deep phylogenies requires critical reevaluation.

Acknowledgments

This work was supported by a BBSRC PhD Studentshipto M. J. Phylogenetic analyses were carried out on a condorcompute cluster funded by NERC. B.G. was supportedby an SNF-IHP Grant (83-EU-065528), the Royal Society(R-36579) and the Basler Stiftung für experimentelle Zool-ogie (c/o Eidgenössisches Tropeninstitut). We would like tothank three reviewers for their helpful comments.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at doi:10.1016/j.ympev.2006.11.017.

References

Altschul, S.F., Madden, T.L., SchaVer, A.A., Zhang, J., Zhang, Z., Miller,W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new gen-eration of protein database search programs. Nucleic Acids Res. 25,3389–3402.

Barker, S.C., Murrell, A., 2004. Systematics and evolution of ticks with alist of valid genus and species names. Parasitology 129 (Suppl),S15–S36.

Black 4th, W.C., Roehrdanz, R.L., 1998. Mitochondrial gene order is notconserved in arthropods: prostriate and metastriate tick mitochondrialgenomes. Mol. Biol. Evol. 15, 1772–1785.

Boore, J., Brown, W., 1998. Big trees from little genomes: mitochondrialgene order as a phylogenetic tool. Curr. Opin. Genet. Dev. 8,668–674.

Boore, J., Brown, W., 2000. Mitochondrial genomes of Galathealinum,Helobdella, and Platynereis: sequence and gene arrangement compari-sons indicate that Pogonophora is not a phylum and Annelida andArthropoda are not sister taxa. Mol. Biol. Evol. 17, 87–106.

Brinkmann, H., Giezen, M., Zhou, Y., Raucourt, G., Philippe, H., 2005. Anempirical assessment of long-branch attraction artefacts in deepeukaryotic phylogenomics. Syst. Biol. 54, 743–757.

Cameron, S., Barker, S., Whiting, M., 2006. Mitochondrial genomics andthe new insect order Mantophasmatodea. Mol. Phylogenet. Evol. 38,274–279.

Cook, C., Yue, Q., Akam, M., 2005. Mitochondrial genomes suggest thathexapods and crustaceans are mutually paraphyletic. Proc. Biol. Sci.272, 1295–1304.

Crease, T.J., 1999. The complete sequence of the mitochondrial genome ofDaphnia pulex (Cladocera: Crustacea). Gene 233, 89–99.

Dávila, S., Piñero, D., Bustos, P., Cevallos, M., Dávila, G., 2005. The mito-chondrial genome sequence of the scorpion Centruroides limpidus(Karsch 1879) (Chelicerata; Arachnida). Gene 360, 92–102.

de Queiroz, A., Gatesy, J., 2006. The supermatrix approach to systematics.Trends Ecol. Evol. 22, 34–41.

Dobson, S., Barker, S., 1999. Phylogeny of the hard ticks (Ixodidae)inferred from 18S rRNA indicates that the genus Aponomma is para-phyletic. Mol. Phylogenet. Evol. 11, 288–295.

Dotson, E.M., Beard, C.B., 2001. Sequence and organization of the mito-chondrial genome of the Chagas disease vector, Triatoma dimidiata.Insect Mol. Biol. 10, 205–215.

Dunlop, J.A., Arango, C.P., 2005. Pycnogonid aYnities: a review. J. Zoo.Syst. Evol. Res. 43, 8–21.

Dunlop, J., Braddy, S., 2001. Scorpions and their sister-group relation-ships. In: Fet, V., Selden, P.A. (Eds.), Scorpions 2001 In Memoriam

Page 12: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

594 M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595

Gary A Polis. The British Arachnological Society, Burnham Beeches,UK, pp. 1–24.

Dunlop, J., Webster, M., 1999. Fossil evidence, terrestrialization andarachnid phylogeny. J. Arachnol. 27, 86–93.

Felsenstein, J., 1978. Cases in which parsimony or compatibility methodswill be positively misleading. Syst. Zool. 27, 401–410.

Gadagkar, S., Kumar, S., 2005. Maximum likelihood outperforms maxi-mum parsimony even when evolutionary rates are heterotachous. Mol.Biol. Evol. 22, 2139–2141.

Galtier, N., Gouy, M., 1995. Inferring phylogenies from DNA sequences ofunequal base compositions. Proc. Natl. Acad. Sci. USA 92, 11317–11321.

Gantenbein, B., Largiadèr, C., 2003. The phylogeographic importance ofthe Strait of Gibraltar as a gene Xow barrier in terrestrial arthropods: acase study with the scorpion Buthus occitanus as model organism. Mol.Phylogenet. Evol. 28, 119–130.

Giribet, G., 2002. Current advances in the phylogenetic reconstruction ofmetazoan evolution. A new paradigm for the Cambrian explosion?Mol. Phylogenet. Evol. 24, 345–357.

Giribet, G., Edgecombe, G., Wheeler, W., Babbitt, C., 2002. Phylogeny andsystematic position of Opiliones: a combined analysis of cheliceraterelationships using morphological and molecular data. Cladistics 18,5–70.

Giribet, G., Richter, S., Edgecombe, G.D., Wheeler, W.C., 2005. The posi-tion of crustaceans within Arthropoda—evidence from nine molecularloci and morphology. Crust. Issues 16, 307–352.

Gordon, D., Abajian, C., Green, P., 1998. Consed: a graphical tool forsequence Wnishing. Genome Res. 8, 195–202.

Hassanin, A., 2006. Phylogeny of Arthropoda inferred from mitochondrialsequences: strategies for limiting the misleading eVects of multiplechanges in pattern and rates of substitution. Mol. Phylogenet. Evol. 38,100–116.

Hassanin, A., Leger, N., Deutsch, J., 2005. Evidence for multiple reversalsof asymmetric mutational constraints during the evolution of the mito-chondrial genome of metazoa, and consequences for phylogeneticinferences. Syst. Biol. 54, 277–298.

Hofacker, I., 2003. Vienna RNA secondary structure server. Nucleic AcidsRes. 31, 3429–3431.

Jones, M., Blaxter, M., 2006. TaxMan: a Taxonomic database Manager.BMC Bioinformatics 7, 536.

Kass, R., Raftery, A., 1995. Bayes Factors. J. Am. Stat. Assoc. 90, 773–795.Kenyon, F., Welsh, M., Parkinson, J., Whitton, C., Blaxter, M., Knox, D.,

2003. Expressed sequence tag survey of gene expression in the scabmite Psoroptes ovis—allergens, proteases and free-radical scavengers.Parasitology 126, 451–460.

Kluge, A.G., 1989. A concern for evidence and a phylogenetic hypothesisof relationships among Epicrates (Boidae, Serpentes). Syst. Zool. 38,7–25.

Lavrov, D., Boore, J., Brown, W., 2000. The complete mitochondrial DNAsequence of the horseshoe crab Limulus polyphemus. Mol. Biol. Evol.17, 813–824.

Lavrov, D., Boore, J., Brown, W., 2002. Complete mtDNA sequences oftwo millipedes suggest a new model for mitochondrial gene rearrange-ments: duplication and nonrandom loss. Mol. Biol. Evol. 19, 163–169.

Lee, C., Grasso, C., Sharlow, M.F., 2002. Multiple sequence alignmentusing partial order graphs. Bioinformatics 18, 452–464.

Lemmon, A.R., Moriarty, E.C., 2004. The importance of proper modelassumption in bayesian phylogenetics. Syst. Biol. 53, 265–277.

Lowe, T., Eddy, S., 1997. tRNAscan-SE: a program for improved detec-tion of transfer RNA genes in genomic sequence. Nucleic Acids Res.25, 955–964.

Mallatt, J., Giribet, G., 2006. Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa: 37 more arthropods and a kin-orhynch. Mol. Phylogenet. Evol. 40, 772–794.

Mallatt, J.M., Garey, J.R., Shultz, J.W., 2004. Ecdysozoan phylogeny andBayesian inference: Wrst use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin. Mol. Phyloge-net. Evol. 31, 178.

Masta, S., Boore, J., 2004. The complete mitochondrial genome sequenceof the spider Habronattus oregonensis reveals rearranged and extremelytruncated tRNAs. Mol. Biol. Evol. 21, 893–902.

Maxmen, A., Browne, W., Martindale, M., Giribet, G., 2005. Neuroanat-omy of sea spiders implies an appendicular origin of the protocerebralsegment. Nature 437, 1144–1148.

Nardi, F., Spinsanti, G., Boore, J., Carapelli, A., Dallai, R., Frati, F., 2003.Hexapod origins: monophyletic or paraphyletic? Science 299, 1887–1889.

Navajas, M., Le Conte, Y., Solignac, M., Cros-Arteil, S., Cornuet, J., 2002.The complete sequence of the mitochondrial genome of the honeybeeectoparasite mite Varroa destructor (Acari: Mesostigmata). Mol. Biol.Evol. 19, 2313–2317.

Negrisolo, E., Minelli, A., Valle, G., 2004. Extensive gene order rearrange-ment in the mitochondrial genome of the centipede Scutigera coleopt-rata. J. Mol. Evol. 58, 413–423.

Nixon, K.C., Carpenter, J.M., 1996. On simultaneous analysis. Cladistics12, 221–241.

Nylander, J.A.A., Ronquist, F., Huelsenbeck, J.P., Nieves-Aldrey, J.L.,2004. Bayesian phylogenetic analysis of combined data. Syst. Biol. 53,47–67.

Ohtsuki, T., Kawai, G., Watanabe, K., 1998. Stable isotope-edited NMRanalysis of Ascaris suum mitochondrial tRNAMet having a TV-replacement loop. J. Biochem. (Tokyo) 124, 28–34.

Olson, S.A., 2002. EMBOSS opens up sequence analysis. European Molec-ular Biology Open Software Suite. Brief Bioinform. 3, 87–91.

Philippe, H., Lartillot, N., Brinkmann, H., 2005a. Multigene analyses ofbilaterian animals corroborate the monophyly of Ecdysozoa, Lopho-trochozoa, and Protostomia. Mol. Biol. Evol. 22, 1246–1253.

Philippe, H., Zhou, Y., Brinkmann, H., Rodrigue, N., Delsuc, F., 2005b.Heterotachy and long-branch attraction in phylogenetics. BMC Evol.Biol. 5, 50.

Posada, D., Buckley, T.R., 2004. Model selection and model averagingin phylogenetics: advantages of Akaike information criterionand Bayesian approaches over likelihood ratio tests. Syst. Biol. 53,793–808.

Posada, D., Crandall, K., 1998. MODELTEST: testing the model of DNAsubstitution. Bioinformatics 14, 817–818.

Pupko, T., Huchon, D., Cao, Y., Okada, N., Hasegawa, M., 2002. Combin-ing multiple data sets in a likelihood analysis: which models are thebest? Mol. Biol. Evol. 19, 2294–2307.

Qiu, Y., Song, D., Zhou, K., Sun, H., 2005. The mitochondrial sequences ofHeptathela hangzhouensis and Ornithoctonus huwena reveal uniquegene arrangements and atypical tRNAs. J. Mol. Evol. 60, 57–71.

Regier, J.C., Shultz, J.W., 2001. Elongation factor-2: a useful gene forarthropod phylogenetics. Mol. Phylogenet. Evol. 20, 136–148.

Reyes, A., Gissi, C., Pesole, G., Saccone, C., 1998. Asymmetrical directionalmutation pressure in the mitochondrial genome of mammals. Mol.Biol. Evol. 15, 957–966.

Rokas, A., Williams, B.L., King, N., Carroll, S.B., 2003. Genome-scaleapproaches to resolving incongruence in molecular phylogenies.Nature 425, 798–804.

Ronquist, F., Huelsenbeck, J., 2003. MrBayes 3: Bayesian phylogeneticinference under mixed models. Bioinformatics 19, 1572–1574.

Shao, R., Mitani, H., Barker, S.C., Takahashi, M., Fukunaga, M., 2005.Novel mitochondrial gene content and gene arrangement indicate ille-gitimate inter-mtDNA recombination in the chigger mite, Leptotrom-bidium pallidum. J. Mol. Evol. 60, 764–773.

Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdi-gian, C., Fuellen, G., Gilbert, J.G.R., Korf, I., Lapp, H., Lehvaslaiho,H., Matsalla, C., Mungall, C.J., Osborne, B.I., Pocock, M.R., Schattner,P., Senger, M., Stein, L.D., Stupka, E., Wilkinson, M.D., Birney, E.,2002. The Bioperl toolkit: Perl modules for the life sciences. GenomeRes. 12, 1611–1618.

Sullivan, J., SwoVord, D.L., 2001. Should we use model-based methods forphylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated?Syst. Biol. 50, 723–729.

Page 13: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions

M. Jones et al. / Molecular Phylogenetics and Evolution 43 (2007) 583–595 595

Tanaka, M., Ozawa, T., 1994. Strand asymmetry in human mitochondrialDNA mutations. Genomics 22, 327–335.

Umetsu, K., Iwabuchi, N., Yuasa, I., Saitou, N., Clark, P.F., Boxshall, G.,Osawa, M., Igarashi, K., 2002. Complete mitochondrial DNA sequenceof a tadpole shrimp (Triops cancriformis) and analysis of museum sam-ples. Electrophoresis 23, 4080–4084.

Watanabe, Y., Tsurui, H., Ueda, T., Furushima, R., Takamiya, S., Kita, K.,Nishikawa, K., Watanabe, K., 1994. Primary and higher order struc-tures of nematode (Ascaris suum) mitochondrial tRNAs lacking eitherthe T or D stem. J. Biol. Chem. 269, 22902–22906.

Weygoldt, P., 1998. REVIEW Evolution and systematics of the Chelicer-ata. Exp. Appl. Acarol. 22, 63–79.

Weygoldt, P., Paulus, H., 1979. Untersuchungen zur Morphologie, Taxon-omie und Phylogenie der Chelicerata. Zeitschrift für Zoologische Sys-tematik und Evolutionsforschung. Z. Zool. Syst. Evol. 17, 177–200.

Wheeler, W.C., Hayashi, C.Y., 1998. The phylogeny of the extant chelicer-ate orders. Cladistics 14, 173–192.

Wiens, J.J., 2003. Missing data, incomplete taxa, and phylogenetic accu-racy. Syst. Biol. 52, 528–538.

Wolstenholme, D., Macfarlane, J., Okimoto, R., Clary, D., Wahleithner, J.,1987. Bizarre tRNAs inferred from DNA sequences of mitochondrialgenomes of nematode worms. Proc. Natl. Acad. Sci. USA 84, 1324–1328.

Wuyts, J., Perrière, G., Peer, Y., 2004. The European ribosomal RNA data-base. Nucleic Acids Res. 32, D101–D103.