towards a mitogenomic phylogeny of lepidoptera · towards a mitogenomic phylogeny of lepidoptera...

10
Towards a mitogenomic phylogeny of Lepidoptera Martijn J.T.N. Timmermans a,b,, David C. Lees c , Thomas J. Simonsen a a Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom b Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdom c Department of Zoology, Cambridge University, Downing Street CB2 3EJ, United Kingdom article info Article history: Received 13 January 2014 Revised 11 May 2014 Accepted 26 May 2014 Available online 6 June 2014 Keywords: tRNA rearrangement Ditrysia Illumina LR-PCR Pooled mitochondrial genome assembly abstract The backbone phylogeny of Lepidoptera remains unresolved, despite strenuous recent morphological and molecular efforts. Molecular studies have focused on nuclear protein coding genes, sometimes adding a single mitochondrial gene. Recent advances in sequencing technology have, however, made acquisition of entire mitochondrial genomes both practical and economically viable. Prior phylogenetic studies utilised just eight of 43 currently recognised lepidopteran superfamilies. Here, we add 23 full and six partial mitochondrial genomes (comprising 22 superfamilies of which 16 are newly represented) to those publically available for a total of 24 superfamilies and ask whether such a sample can resolve deeper lepidopteran phylogeny. Using recoded datasets we obtain topologies that are highly congruent with prior nuclear and/or morphological studies. Our study shows support for an expanded Obtectomera including Gelechioidea, Thyridoidea, plume moths (Alucitoidea and Pterophoroidea; possibly along with Epermenioidea), Papilionoidea, Pyraloidea, Mimallonoidea and Macroheterocera. Regarding other controversially positioned higher taxa, Doidae is supported within the new concept of Drepanoidea and Mimallonidae sister to (or part of) Macroheterocera, while among Nymphalidae butterflies, Danainae and not Libytheinae are sister to the remainder of the family. At the deepest level, we suggest that a tRNA rearrangement occurred at a node between Adeloidea and Ditrysia + Palaephatidae + Tischeriidae. Ó 2014 Elsevier Inc. All rights reserved. 1. Introduction Reconstructing a stable phylogenetic framework for Lepidop- tera at the superfamily level is an ongoing process that, despite considerable effort over recent years, is still far from complete. Kristensen and Skalski (1999) overview summarised the most comprehensive morphology-based hypothesis of the last century for the phylogeny of the order. While the early part of Lepidoptera evolution was presented as a well-resolved ‘‘Hennigean comb’’, the phylogeny of the vast majority of the order (the suborder Ditrysia which comprise more than 95% of all known species) was almost entirely unresolved apart from a few higher-level clades. This overview made it abundantly clear that large-scale molecular analyses were needed to resolve the higher-level phylogeny of ditrysian Lepidoptera. With more than 157,000 described species (van Nieukerken et al., 2011), Lepidoptera (butterflies and moths) represent one of the evolutionarily most successful lineages of phytophagous insects (e.g. Grimaldi and Engel, 2005). Furthermore, the order includes a number of biological model organisms, many severe pest species, and comprises many of the best known and most popular invertebrates, emphasising that studies into lepidopteran phylogeny and evolution are of both scientific and public interest. It is therefore not surprising that the past few years have seen sev- eral attempts to resolve the higher-level phylogeny of Lepidoptera based on comprehensive molecular datasets. The ‘‘LepTree project’’ has already resulted in a number of benchmark publications focused both at the superfamily level (e.g. Regier et al., 2009, 2013; Cho, et al. 2011; Bazinet et al., 2013; Sohn et al., 2013) and for clusters of several larger superfamilies (Zwick et al., 2011; Regier et al., 2012a,b). The ‘‘Ditrysia project’’ has so far resulted in a superfamily-level molecular phylogeny (Mutanen et al., 2010) and a phylogeny of Papilionoidea (Heikkila et al., 2012), while associated studies have focused on the superfamilies Gelechioidea (Kaila et al., 2011) and Noctuoidea (Zahiri et al., 2011, 2012, 2013a,b), as well as the large family Gelechiidae (Karsholt et al., 2013). Both the LepTree project and the Ditrysia project have provided considerable new resolution to the higher phylogeny of Ditrysia, but many groupings have been mutually exclusive between the studies and others were not well supported. Many of these poorly supported groupings are found in the same parts of the Lepidoptera Tree of Life where morphology alone has http://dx.doi.org/10.1016/j.ympev.2014.05.031 1055-7903/Ó 2014 Elsevier Inc. All rights reserved. Corresponding author. E-mail addresses: [email protected] (M.J.T.N. Timmermans), [email protected] (D.C. Lees), [email protected] (T.J. Simonsen). Molecular Phylogenetics and Evolution 79 (2014) 169–178 Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev

Upload: buituong

Post on 29-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Molecular Phylogenetics and Evolution 79 (2014) 169–178

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier .com/ locate /ympev

Towards a mitogenomic phylogeny of Lepidoptera

http://dx.doi.org/10.1016/j.ympev.2014.05.0311055-7903/� 2014 Elsevier Inc. All rights reserved.

⇑ Corresponding author.E-mail addresses: [email protected] (M.J.T.N. Timmermans),

[email protected] (D.C. Lees), [email protected] (T.J. Simonsen).

Martijn J.T.N. Timmermans a,b,⇑, David C. Lees c, Thomas J. Simonsen a

a Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdomb Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdomc Department of Zoology, Cambridge University, Downing Street CB2 3EJ, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history:Received 13 January 2014Revised 11 May 2014Accepted 26 May 2014Available online 6 June 2014

Keywords:tRNA rearrangementDitrysiaIlluminaLR-PCRPooled mitochondrial genome assembly

The backbone phylogeny of Lepidoptera remains unresolved, despite strenuous recent morphological andmolecular efforts. Molecular studies have focused on nuclear protein coding genes, sometimes adding asingle mitochondrial gene. Recent advances in sequencing technology have, however, made acquisition ofentire mitochondrial genomes both practical and economically viable. Prior phylogenetic studies utilisedjust eight of 43 currently recognised lepidopteran superfamilies. Here, we add 23 full and six partialmitochondrial genomes (comprising 22 superfamilies of which 16 are newly represented) to thosepublically available for a total of 24 superfamilies and ask whether such a sample can resolve deeperlepidopteran phylogeny. Using recoded datasets we obtain topologies that are highly congruent withprior nuclear and/or morphological studies. Our study shows support for an expanded Obtectomeraincluding Gelechioidea, Thyridoidea, plume moths (Alucitoidea and Pterophoroidea; possibly along withEpermenioidea), Papilionoidea, Pyraloidea, Mimallonoidea and Macroheterocera. Regarding othercontroversially positioned higher taxa, Doidae is supported within the new concept of Drepanoideaand Mimallonidae sister to (or part of) Macroheterocera, while among Nymphalidae butterflies, Danainaeand not Libytheinae are sister to the remainder of the family. At the deepest level, we suggest that a tRNArearrangement occurred at a node between Adeloidea and Ditrysia + Palaephatidae + Tischeriidae.

� 2014 Elsevier Inc. All rights reserved.

1. Introduction

Reconstructing a stable phylogenetic framework for Lepidop-tera at the superfamily level is an ongoing process that, despiteconsiderable effort over recent years, is still far from complete.Kristensen and Skalski (1999) overview summarised the mostcomprehensive morphology-based hypothesis of the last centuryfor the phylogeny of the order. While the early part of Lepidopteraevolution was presented as a well-resolved ‘‘Hennigean comb’’, thephylogeny of the vast majority of the order (the suborder Ditrysiawhich comprise more than 95% of all known species) was almostentirely unresolved apart from a few higher-level clades. Thisoverview made it abundantly clear that large-scale molecularanalyses were needed to resolve the higher-level phylogeny ofditrysian Lepidoptera.

With more than 157,000 described species (van Nieukerkenet al., 2011), Lepidoptera (butterflies and moths) represent one ofthe evolutionarily most successful lineages of phytophagousinsects (e.g. Grimaldi and Engel, 2005). Furthermore, the order

includes a number of biological model organisms, many severepest species, and comprises many of the best known and mostpopular invertebrates, emphasising that studies into lepidopteranphylogeny and evolution are of both scientific and public interest.It is therefore not surprising that the past few years have seen sev-eral attempts to resolve the higher-level phylogeny of Lepidopterabased on comprehensive molecular datasets. The ‘‘LepTree project’’has already resulted in a number of benchmark publicationsfocused both at the superfamily level (e.g. Regier et al., 2009,2013; Cho, et al. 2011; Bazinet et al., 2013; Sohn et al., 2013) andfor clusters of several larger superfamilies (Zwick et al., 2011;Regier et al., 2012a,b). The ‘‘Ditrysia project’’ has so far resultedin a superfamily-level molecular phylogeny (Mutanen et al.,2010) and a phylogeny of Papilionoidea (Heikkila et al., 2012),while associated studies have focused on the superfamiliesGelechioidea (Kaila et al., 2011) and Noctuoidea (Zahiri et al.,2011, 2012, 2013a,b), as well as the large family Gelechiidae(Karsholt et al., 2013). Both the LepTree project and the Ditrysiaproject have provided considerable new resolution to the higherphylogeny of Ditrysia, but many groupings have been mutuallyexclusive between the studies and others were not well supported.Many of these poorly supported groupings are found in the sameparts of the Lepidoptera Tree of Life where morphology alone has

170 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178

failed to yield conclusive evidence for evolutionary relationships. Itis therefore clear that despite considerable efforts, we still remainfar from a unified consensus of the phylogeny of the higherLepidoptera. For example, sister groups of some of the most prom-inent lepidopteran radiations (Papilionoidea and Gelechioidea)remain as yet unknown. To maximise our chances for arriving atsuch a ‘‘consensus hypothesis’’, we need to look at broadening bothtaxon and character sampling in future phylogenetic studies.

Within the realm of molecular characters, full mitochondrialgenomes can be a data-rich and relatively accessible source ofinformation, and promising results have been obtained for otherinsect families and orders (e.g. Timmermans et al., 2010; Haranet al., 2013; Simon and Hadrys, 2013). So far, the mt-phylogenomicstudies with the most comprehensive lepidopteran taxon samplingremain the Kim et al. (2011), Lu et al. (2013) analyses that focusedalmost entirely on the polyphyletic group ‘‘Macrolepidoptera’’(essentially Papilionoidea plus Macroheterocera sensu vanNieukerken et al., 2011).

Here we characterise full mitochondrial genomes using amethod that relies on the in silico separation of DNA sequencedata from pooled PCR fragments (Timmermans et al., 2010). Weacquired mitochondrial genomes of a diverse set of lepidopteransand present the most comprehensive mt-phylogenomic study ofLepidoptera to date (mainly focused on the subgroup Heterone-ura). We explore the evolutionary patterns of the different genesin the mt-genome across heteroneuran Lepidoptera and assessthe utility of mt-genomes in lepidopteran phylogenetics.

2. Materials and methods

2.1. PCR amplification and sequencing

Wet-lab methods are based on those described in Timmermanset al. (2010). Mitochondrial genomes were PCR amplified from

Table 1Details of Lepidoptera samples (29) subjected to mitochondrial genome sequencing.

Superfamily Family Genus Species V

1� Papilionoidea Papilionidae Papilio dardanus B2 Geometroidea Geometridae Chloroclysta truncata B3 Drepanoidea Drepanidae Drepana arcuata C4 Incertae sedis Doidae Doa BOLD:AAC3419 055 Noctuoidea Noctuidae Acronicta psi B6 Noctuoidea Noctuidae Noctua pronuba B7 Mimallonoidea Mimallonidae Lacosoma valva R8 Lasiocampoidea Lasiocampidae Apatelopteryx phenax B9 Thyridoidea Thyrididae Rhodoneura mellea TH10 Pyraloidea Crambidae Eudonia angustea B11 Tortricoidea Tortricidae Epiphyas postvittana B12� Zygaenoidea Zygaenidae Ischnusia culiculina IC13 Pterophoroidea Pterophoridae Emmelina monodactyla B14 Alucitoidea Alucitidae Alucita montana N15� Epermenioidea Epermeniidae Epermenia illigerella M16 Urodoidea Urodidae Urodus decens D17� Cossoidea Sesiidae Pennisetia hylaeiformis B18� Cossoidea Cossidae Zeuzera coffeae A19 Gelechioidea Elachistidae Ethmia eupositica M20 Gelechioidea Oecophoridae Endrosis sarcitrella B21 Gelechioidea Autostichidae Oecogonia novimundi B22 Gelechioidea Cosmopterigidae Perimede sp. JW23 Gracillarioidea Gracillariidae Phyllonorycter froelichiella B24 Gracillarioidea Gracillariidae Phyllonorycter platani B25 Gracillarioidea Gracillariidae Cameraria ohridella B26 Tineoidea Tineidae Tineola bisselliella B27 Tischerioidea Tischeriidae Astrotischeria sp. K28 Nepticuloidea Nepticulidae Stigmella roborella B29� Micropterigoidea Micropterigidae Micropterix calthella U

* Asterisked voucher numbers indicate that genomic DNA was provided by the ‘‘Leptreemitochondrial genome is incomplete. The Papilio dardanus data was combined with anrepresented by a BOLD cluster number.

representatives of 22 Lepidoptera superfamilies (Table 1). PCRreactions were performed with Takara LA taq and followed cyclingconditions described in Timmermans et al. (2010). Two primerpairs were used for long-range PCR: One pair (SytbF/reversecomplement of Jerry, JerryRC: Simon et al., 1994) targeting a 7 kbfragment running from COI to CytB across the AT-rich region,and one pair (Jerry: Simon et al., 1994/SytbR) targeting the oppo-site �9 kb half of the circular genome (Fig. 1). Not all PCRs weresuccessful, and for some species one fragment was obtained(Table 1; Fig. 2). For Pennisetia hylaeiformis the JerryRC-SytbR PCRfailed and a shorter fragment was targeted by replacing the JerryRCprimer with stevCO3F (GCWTGATAYTGACAYTTYGT) located inCOIII. In contrast to Timmermans et al. (2010), who used Roche’s454 platform, sequencing was performed using Illumina’s Solexatechnology. Fragments were pooled before TruSeq libraryconstruction (800 bp insert size) and paired-end sequenced (PE:2 � 250 bp; V2 Reagent Kit) on an Illumina Miseq aiming for �1%of the machines run capacity. In addition to these Lepidopterasamples, the pool also contained 22 partial beetle mitochondrialgenomes, which are not further discussed here. Three ‘baits’ weregenerated to assign anonymous contigs to the voucher specimens.These baits included a 400 bp fragment of CytB (primer pair:SytbF + SytbR) and two COI fragments (primer pairs LepCoxF1:ACNAATCATAARGATATTGGNAC + JerryRC and Jerry + sCOIR: CARA-TTTCDGARCATTG) (Fig. 1). ‘Bait’ sequences were bi-directionallysequenced using Sanger technology and edited in Geneious version5.6 (Biomatters, Auckland, New Zealand).

2.2. Sequence assembly and annotation

Raw sequence reads were pre-processed using the Trimmomat-ic (Lohse et al., 2012) and Prinseq-lite (Schmieder and Edwards,2011) perl scripts. Bases with a Phred quality score below 20 weretrimmed. Sequence reads with a high quality length below 150 bp

oucher number Sequence length (bp) Mean coverage Accession number

MNH746800 15,337 405 JX313686MNH1043250 15,828 428 KJ508061WM-96-0579* 15,300 254 KJ508053

-srnp-63413* 15,228 185 KJ508058MNH1043251 15,348 224 KJ508060MNH1043252 15,315 461 KJ508057WH-96-0919* 16,108 66 KJ508050MNH730001 15,648 41 KJ508055

1 15,615 129 KJ508038MNH1044343 15,386 383 KJ508052MNH1044344 15,451 495 KJ5080511 8134 124 KJ508043

MNH1044345 15,252 410 KJ508063B-06-0080* 15,272 639 KJ508059M08-2525* 7983 141 KJ508039A-03-3349* 15,279 315 KJ508062MNH1039831 14,199 262 KJ508049YK-04-0891-03* 7983 45 KJ508046JM-96-0277* 15,347 466 KJ508047

MNH1044346 15,317 245 KJ508037MNH1044347 15,408 485 KJ508036

B-08-0217* 15,131 228 KJ508041MNH1044348 15,538 466 KJ508048MNH1044349 15,791 50 KJ508044MNH1044350 15,512 368 KJ508042MNH1044351 15,661 556 KJ508045N-06-1154* 15,079 311 KJ508056MNH1044352 15,565 351 KJ508054NK-93-0057* 9150 70 KJ508040

’’ initiative. Numbers in the first column followed by a minus sign indicate that theexisting DNA sequence to produce a full mitochondrial genome. The Doa species is

Fig. 1. PCR setup applied. For Illumina sequencing, the circular mitochondrial genome was amplified in two overlapping pieces, both spanning from COI to CytB. To assignspecies names to the assembled mitochondrial contigs, three ‘bait’ sequences were generated using Sanger technology located in Cox1 50 , Cox1 30 , and CytB (grey boxes).

Fig. 2. Sequence coverage of each of the 29 mitochondrial fragments. Horizontal axis depicts fragment length (x 1 K). Vertical axis depicts read coverage (as given). Note thatcoverage is highly variable, differs between the two LR-PCR products and tends to peak where the two PCR products overlap.

M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 171

172 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178

and reads containing ‘Ns’ were discarded. Sequences for whichonly one read of the pair passed this quality filter were also dis-carded. Remaining reads were assembled using Roche 454/Newbler software (settings: -rip -mi 95 -ml 100). Mitochondrialcontigs were identified by linking them to Sanger ‘bait’ sequencesusing stand-alone BLAST (Altschul et al., 1990). The P. dardanuscontig was combined with an unpublished, incomplete mitochon-drial genome that was generated as part of a different 454 basedstudy (Timmermans et al., unpublished) to form a single completemitochondrial genome. Mitochondrial tRNAs were annotated usingCOVE (Eddy and Durbin, 1994) as described in (Timmermans andVogler, 2012) and positions of protein coding and rRNA genes weredetermined using Geneious (version 5.6) by transferring annota-tions of a reference genome to newly obtained mitochondrialsequences. To obtain estimates of mean coverage Illumina readswere remapped to the mitochondrial contigs using Geneious(minimum overlap 150 bp, a maximum of 2% mismatches and amaximum gap size of 10). All annotated mitochondrial genomesequences were submitted to GenBank (accession numbers:JX313686; KJ508036–KJ508063).

2.3. Phylogenetic analyses

Annotated mitochondrial genomes were combined with 77publicly available Lepidoptera mitochondrial genomes that wereavailable at the cut-off date for our analysis, 07-05-2013 (Suppl.Table 1), and FeatureExtract (Wernersson, 2005) was used toextract the individual protein coding genes. Protein coding genematrices were parsed through ClustalW (Thompson et al., 1994)using the transAlign perl script (Bininda-Emonds, 2005) forcodon-based alignment. Uncertainly aligned regions were removedusing the GBlocks program (Castresana, 2000), using ‘‘minimumlength of a block: 10’’ and ‘‘allowed gap positions: with half’’. TherRNA genes were aligned using MAFFT (http://align.bmr.kyushu-u.ac.jp/mafft/) using the Q-INS-i method (Katoh et al., 2005; Katohet al., 2009) and regions of uncertain alignment were pinpointedusing Aliscore (Misof and Misof, 2009) and removed. Protein cod-ing and rRNA genes were concatenated into a single data matrix(named ‘‘nc123’’). Saturation plots, where F84 genetic distancewas plotted against transitions and transversions, were generatedfor this data matrix using DAMBE (Xia and Xie, 2001).

Two additional data matrices were generated from this concat-enated dataset by recoding nucleotide positions. The first recodingscheme involved recoding the first codon position of all protein-coding genes to either purines or pyrimidines (RY) and removingthe fast evolving third codon positions (Hassanin, 2006; Ponset al., 2010). The second scheme applied the Degen script (Regieret al., 2010; Zwick et al., 2012) to convert every codon to a ‘degen-erated’ one, changing all synonymous sites to IUPAC ambiguitynomenclature. The datasets were split in 41 (nc123 dataset, degendataset) and 28 partitions (RY coded dataset) respectively. Parti-tionFinder (Lanfear et al., 2012) was subsequently used to mergethese partitions in the most optimal scheme and select a modelfor each of the predicted partition sets using the Bayesian Informa-tion Criterion. This procedure followed a two-tiered approach.Within PartitionFinder RAxML (version 7.4.1) (Stamatakis et al.,2005) was first used to obtain partition scheme, after which PhyML(version 20111219) (Guindon and Gascuel, 2003) was applied toselect the best fitting model for each partition. Phylogenetic anal-yses were performed on the CIPRES portal (Miller et al., 2010)using MrBayes 3.2.2 (Ronquist et al., 2011) with two independentruns each with four chains. The analysis was run for 12 milliongenerations with sampling every 100 generations. The first 3 mil-lion generations (25%) were discarded as burn-in.

All analyses were performed with (106-taxa dataset) andwithout (105-taxa dataset) Micropterix calthella, a species which

represents the probable extant sister lineage (Micropterigoidea)to all other Lepidoptera and was shown to be highly divergent fromthe others (see Results section). To test the effect of dataincompleteness, 10 partial genome sequences were removed andall the analyses repeated using the 96 remaining taxa. Therefore,in total nine tree searches were performed (nc123, degen, andRY, all for � 106 taxa, 105 taxa, and 96 taxa).

2.4. Phylogenetically informative tRNA translocation

A putative phylogenetically informative tRNA translocation hasbeen reported for Lepidoptera (Cao et al., 2012). The inferredancestral insect mitochondrial genome contains three tRNAsbetween the control region and ND2 in the order tRNA-Ile,tRNA-Glu, tRNA-Met, but for all sequenced ditrysian genomes thisorder is tRNA-Met, tRNA-Ile, tRNA-Glu. The Hepialoidea have beenreported to show the ancestral order (Cao et al., 2012), and there-fore this translocation event must have occurred after this lineagediverged from the more derived Lepidoptera. To pinpoint thistranslocation event, gene order of the newly generated genomesequences was investigated. In addition, PCR was performed onmembers of Adeloidea (Nematopogon sp., Hertfordshire, UK) andPalaephatidae (Azaleodes fuscipes, Queensland, Australia) for whichto our knowledge no full mitochondrial genomes are available. PCRused a primer pair that only amplifies the order tRNA-Met, tRNA-Ile,tRNA-Glu, with a primer in tRNA-Met (AAGCTWTTRRGYTCATACC)and in tRNA-Ile (CTATCANAATANYCCTTT) and an annealing tem-perature of 40 �C.

3. Results

3.1. Data quality and assembly

A total of 333,246 read-pairs were generated for the sequencelibrary. After quality control 322,129 R1 reads and 317,979 R2reads were retained with a mean sequence length of 249 bp, whichrepresented 316,940 full sequence pairs. Following assembly andmanual concatenation of unassembled fragments, 23 full and 6partial lepidopteran mitochondrial genomes were obtained(Table 1; Fig. 2), which covered nearly all the added DNA template.The only genome for which we did not obtain the full sequence ofthe added PCR product was M. calthella, for which ca. half of theSytbF-JerryRC fragment was missing. The bait sequences unequiv-ocally assigned all contigs to species.

Reads were mapped back onto final contigs, which revealed themean coverage to be very high, but also very variable (mean cover-age: 296, stdev: 169; Fig. 2). In general, highest coverage wasobserved for the overlapping fragment in CytB that was presenton both LR-PCR products. In addition, coverage of the PCR fragmentthat includes the AT-rich region was generally higher than cover-age of the other half of the genome. The mean coverage was lowestfor Apatelopteryx phenax (Lasiocampidae), but still more than 40X.

3.2. Data matrix, model selection and partitioning schemes

The new sequences were combined with 77 publicly availablemitochondrial genomes to generate a data matrix of 13,311 basepairs, of which 4.3% was missing or represented gaps.

The saturation plot (Supplementary Fig. 1) on the concate-nated dataset revealed that one species (M. calthella) was highlydivergent compared to the remaining taxa. This species repre-sented the sister lineage of all other taxa included in this studyand the high divergence is therefore not surprising—in particularas none of the seven superfamilies traditionally placed betweenMicropterigoidea and Hepialoidea (e.g. Kristensen and Skalski,

M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 173

1999) were included in the analyses as material of sufficient qual-ity was not available at the time of the sequence run. However, asits divergence from the others might affect its correct placement(due to missing data and long-branch attraction), we decided toperform phylogenetic analyses with and without this species. Thisconfirmed the divergence and difficult placing of this basal lineage(Supplementary Fig. 2). We therefore excluded M. calthella fromfurther analyses and used the two representatives of Hepialoideaas outgroups, focussing our study on the large lepidopteransubgroup Heteroneura which contains > 99% of all Lepidoptera(Kristensen and Skalski, 1999; van Nieukerken et al., 2011) andhas appeared as monophyletic in all previous studies ofLepidoptera phylogeny.

PartitionFinder suggested five partitions to be optimal for thenc123 datasets. The saturation plot indicated substitutional satura-tion, specifically involving transitional changes at third codon posi-tions. Nucleotide recoding reduced the number of partitions tothree, indicating that both recoding schemes were effective inreducing saturation and/or compositional heterogeneity (Fig. 3).Removing the incomplete genomes had only a minor effect on par-tition schemes, only affecting the placement of the second codonpositions of ND4L (nc123 datasets) and ATP8 (degen datasets).For the 105-taxa datasets, the main difference between the degenand RY-coded partition schemes is that the two rRNA genes wereassigned a separate partition for the RY-coded dataset, whereasthey were merged with most of the first codon positions of thedegen alignment. Strikingly, for the nc123 datasets the ‘‘cyto-chrome’’ genes all differed in the assignment of their first codonposition (assigned to partition 1) compared to the NADH geneson the forward strand (assigned to partition 4), whereas their sec-ond and third codon positions were assigned to the same partitions(partition 2 and 3 respectively). The NADH genes on the forwardstrand differed from the NADH genes on the reverse strand in thattheir third codon position was assigned to a different partition(partition 3 and 5 respectively), but their first codon positionsshared a partition (partition 4) (Fig. 3).

3.3. Base frequencies and substitution rates in the mitochondrialgenome

Each of the five partitions of the non-recoded nc123 datasets(Fig. 3) differed in base frequencies and/or substitution rates asestimated by MrBayes (Supplementary Fig. 3 for nc123 105-taxadataset). As expected, the partition that contained the majority ofsecond codon positions showed the lowest rate of substitution,reflecting the absence of degeneracy of these sites and theassociated constraint on mutational change.

Fig. 3. Partition schemes selected by PartitionFinder for each of the 105 and 96 taxa dataNumbers in the boxes refer to the models that were selected and are: nc123: (1) GTR + I +GTR + I + G, (3) GTR + G; RY: (1) SYM + G, (2) GTR + I + G, (3) GTR + I + G (RNA had a uniq

In agreement with above-mentioned saturation plots the thirdcodon positions, which are highly degenerate, showed the highestsubstitution rates, mainly involving transitional changes. The thirdcodon positions of the forward strand and third codon positions onthe reverse strand were grouped in two separate partitions thatdiffered in their nucleotide base frequencies, with the ‘forward’partition being somewhat richer in C and the reverse partitionricher in G, reflecting the well-known CG bias between the twomitochondrial strands (e.g. Pons et al., 2010).

The remaining two partitions contained the first codon posi-tions of the cytochrome genes and the first codon positions ofthe ND genes (combined with two codon positions of ATP8 andthe two rRNA genes) and showed intermediate rates, again biasedtowards transitional changes. It is worth mentioning that ND1,ND2, ND3, ND4, ND4L and ND5 and ND6 are all sub-units of com-plex I of the mitochondrial respiratory chain. Although highly spec-ulative, their grouping might therefore reflect a constraint imposedby their close interaction in this key enzyme. The same might holdtrue for the partition that contains the first codon positions of COI,COII and COIII. These three genes are all subunits of complex IV ofthe respiratory chain.

3.4. Problems with existing Genbank data

We acknowledge that the existing public domain data surelycontain errors. This particularly applies to unpublished submis-sions, where we observed highly similar sequence stretchesbetween about 300–2000 bp in length that might indicate foreigncontaminations. This is found between Celastrina hersilia andParnassius bremeri, Papilio xuthus and Teinopalpus aureus, andTroides aeacus and Luehdorfia chinensis. We also note that thesequence labelled ‘‘Celastrina hersilia’’ represents Cyaniris semiargusaccording a query of the COI-5P (the DNA barcode fragment) onBOLD. Furthermore, although it does not contain long obviouslyshared segments, the Ochlodes venata sequence seems likely to bechimaeric, with mixed sequence from a contaminating family(perhaps Lycaenidae to which ‘‘C. hersilia’’ belongs); indeed onBOLD its COI-5P is 5.4% divergent from a genuine Ochlodes venata(JQ922041). These errors likely caused some topological artefactssuch as the wrong placement of the so-labelled C. hersilia andOchlodes venata that group with the wrong families in the 96 and105-exemplar nc123 analyses (Supplementary Fig. 3). It appearsthat the recoding schemes that we applied corrects these problemsto some extent. This might be due to the schemes primarilyreducing the signal of the most variable first and third codonpositions, thereby (e.g. in the case of Ochlodes venata) changingthe relative amount of suspect sequence compared to genuine data.

matrices. Each coloured box represents one of the three codon positions of a gene.G, (2) GTR + I + G, (3) GTR + G, (4) GTR + I + G, (5) GTR + G; Degen: (1) GTR + I + G, (2)ue partition in this case).

174 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178

3.5. Lepidopteran relationships and the effect of recoding

All analyses resulted in highly comparable trees (Fig. 4, Supple-mentary Fig. 4), yet with some noteworthy differences betweenanalyses. Here we focus largely on a comparison of RY and degenrecoding, with reference to the nc123 analysis where necessary.

Tortricoidea are recovered as sister to the remaining Apoditry-sia in all analyses except the 105-taxa degen analysis, but withvariable support (Fig. 4, Supplementary Fig. 3). Urodoidea arerecovered as sister to the ‘‘Obtectomera sensu lato’’ in all the 96-taxa analyses (Fig. 4, Supplementary Fig. 3), while this superfamilyis placed in an unresolved basal apoditrysian polytomy in the 105-taxa analyses (Fig. 4, Supplementary Fig. 3). A sister relationshipbetween Alucitidoidea + Pterophoroidea is recovered in all analy-ses, albeit with variable support. A sister relationship betweenThyridoidea and Gelechioidea is recovered in all 96-taxa analyses

Fig. 4. Phylogenetic relationships among Lepidoptera. Left: based on the RY analysis of tthe 105 taxa RY recoded dataset as presented in Fig. 3. Major superfamilies are highlighsome very long branches have been truncated (indicated by cross-hatching) to fit the trletters of the family name (Table 1 and Supplementary Table 1) with the exception of Hrepresentation of the topologies that were obtained for the RY and degen recoded 105 t

and the RY-105 analysis. The expanded Obtectomera (includingthe last mentioned four superfamilies and, in the 105-taxa analy-ses, Epermenioidea) are recovered in all analyses except thenc123 and degen 105-taxa analyses, but again, the support valueswere variable (Fig. 4, Supplementary Fig. 3).

Within the Papilionoidea, the well-known relationship of Lyca-enidae (presumed sister to the unsampled Riodinidae) + Nymphal-idae is recovered in all analyses except for the non-recoded (nc123)96-taxa and 105-taxa analyses, where the hesperiid Ochlodesvenata is placed as sister to Lycaenidae (Fig. 4, SupplementaryFig. 3). This could be related to problems with the ‘‘Celastrina her-silia’’ sequence, see above. All analyses place Doidae as well sup-ported sister to our exemplar of Drepanoidea. Both degen-96 anddegen-105 analyses support a sister group relationship betweenMimallonidae and Lasiocampoidea. However, in contrast, the 96and 105 RY-coded datasets place Lacosoma as a well-supported

he 96-taxa data matrix. The preferred partition scheme was identical to the one forted by coloured branches. Numbers are Bayesian posterior probabilities. Note thatee in the figure space. Family codes precede the species names using the first twoe: Hesperiidae, Hp: Hepialidae, Ti: Tischeriidae and Tn: Tineidae. Right: Schematicaxa data matrices.

M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 175

sister to the remaining Macroheterocera (Fig. 4; SupplementaryFig. 3). Within the ‘core’ macroheterocerans, all analyses exceptdegen-105 taxa place Bombycidae and Sphingidae as sister taxawith high support. Overall, nodes within the superfamilies are wellsupported for each of the datasets, and we treat these results inmore detail in the discussion section.

3.6. The effects of missing data

When we compare the 96-taxa and 105-taxa analyses, it is clearthat the incomplete sequences affect support values in the part ofthe trees where they are placed in the 105-taxa analyses (mainly inthe lower apoditrysian grade), and that some of the taxa in ques-tion are placed spuriously. For example, the RY-105 analysis ratherimplausibly placed the incomplete Zygaenoidea sequence as sisterto other apoditrysians including Tortricidae. Otherwise, inclusionof incomplete mitochondrial genomes generally resulted in someresolution among the lower apoditrysian superfamilies. Forexample, the RY-105 analysis recovers the two Cossoidea familiesas sisters, albeit with low support (Fig. 4). Within Obtectomerasensu lato (Fig. 4), there is weak support for a sister relationshipbetween Epermenioidea and the plume moth superfamiliesAlucitidoidea + Pterophoroidea in both the recoded 105-taxaanalyses, but the nc123 analysis fail even to place the latter twosuperfamilies in Obtectomera.

3.7. Gene order

The above-mentioned derived tRNA translocation appears to beimportant for determining the relationship between Ditrysia,Tischeriidae and the remaining higher non-ditrysian superfamilies.When analysing newly generated full genome sequences wediscovered that the Tischeriidae share this translocation with theDitrysia, but the Nepticulidae do not. To determine whether thistranslocation could be a potential synapomorphy for Ditrysia andTischeriidae, we sequenced the relevant region in the mt-genomein an Australian Palaephatoidea and an Adeloidea (two superfam-ilies that where were not included here in the full mt-genomesequencing) using standard PCR. We found that the derived trans-location was present in Palaephatoidea but not in Adeloidea(Fig. 5), supporting the hypothesis that Palaephatidae (at least,Australian members currently ascribed to the family) andTischeriidae are closest relatives to the Ditrysia, which is also inagreement with recent phylogenetic hypotheses (Mutanen et al.,2010; Regier et al., 2013).

4. Discussion

Our study greatly augments the phylogenetic scope of availablelepidopteran mitochondrial genomes, adding 16 new superfamiliesto the eight previously available, and 23 new full mitochondrialgenomes to the 77 available at the time of our analysis.

Fig. 5. Phylogenetically informative tRNA re-arrangement hypothesized to besynapomorphic to Ditrysia, Tischeriidae and Australian Palaephatoidea (if not theunsequenced Andesianidae).

We base our discussion on the results from the analyses of therecoded RY and degen datasets (Fig. 4), as saturation effects mighthave affected the results from the analyses of the non-recodeddatasets, resulting in spurious groupings not reflecting evolution-ary relationships (Supplementary Fig. 3). Our analyses representsthe most comprehensive mitochondrial genome based phyloge-netic study of Lepidoptera and reveal a number of interesting,well-supported relationships in Lepidoptera phylogeny, most ofwhich are in agreement with nuclear DNA studies of Regier et al.(2009, 2013), Cho et al. (2011) and the COI + nuclear DNA studyof Mutanen et al. (2010). Ditrysia are monophyletic and Tineoideaare supported as the sister of the remaining Ditrysia (note thatRegier et al. (2013), Bazinet et al. (2013) found the former super-family to constitute a grade), with Gracillarioidea + Yponomeutoi-dea comprising the next lineage within Ditrysia as the sister toApoditrysia. None of the analyses resulted in a monophyleticYponomeutoidea, and greater sampling there is needed in future.Apoditrysia are recovered as monophyletic, but Tortricoidea areplaced unconventionally as the sister group of the remaining clade.However, this is an arrangement previously suggested by the non-synonymous ‘degen1’ 38-taxon trees in Bazinet et al. (2013; Fig. 4and 5). In the 46-taxon tree in the same study, Urodoidea andTortricoidea switch places respective to our 96-exemplar result.Other relationships among lower apoditrysians remain largelyunresolved, likely due to our modest level of superfamily samplingfrom this section of Lepidoptera (our study is missing eightputative lower apoditrysian superfamilies). Even Bazinet et al.’s741 nuclear genes study using 40 apoditrysian exemplars failedto resolve the lower Apoditrysia (their Fig. 1), but they toowere missing six of the 26 apoditrysian superfamilies. WithinTortricidae, our results confirm the well-known split betweenTortricinae and Olethreutinae (e.g. Regier et al., 2012a).

The support we found for a broadened Obtectomera (Obtecto-mera sensu lato) with Pterophoroidea, Thyridoidea and Gelechioi-dea all grouping together with the more conventional (‘‘core’’)obtectomerans (Papilionoidea, Pyraloidea, Mimallonioidea andMacroheterocera) is in line with the study of Bazinet et al.(2013). Significantly, Gelechioidea never grouped with other lowerApoditrysia (note that van Nieukerken et al., 2011 for lack of con-sensus among taxonomists simply arranged them at the base of theApoditrysia). We found some support in the RY-96 tree for a sisterrelationship of Gelechioidea to the traditionally obtectomeransuperfamily Thyridoidea, but admittedly our taxon sampling maybe too limited to draw any firm conclusion here. Also, in the MLtrees of Bazinet et al. (2013) Gelechioidea form a clade withPterophoroidea and Thyridoidea. A close relationship betweenPterophoroidea and Alucitoidea (the two superfamilies of plumemoths), which is supported in both 96-taxa trees, is in line withearlier suggestions based on morphology (e.g. Kristensen andSkalski, 1999). In both 105-taxa trees Pterophoroidea and Alucitoi-dea group with Epermenioidea (an incomplete sequence, thusabsent from 96-exemplar analyses), albeit weakly supported. Notealso that in the 96-taxa RY tree, the ‘‘plume moths’’ are supportedas sister to the ‘‘core’’ obtectomerans.

As other studies have also reported, Papilionoidea are not mac-rolepidoptera as formerly conceived e.g. by Kristensen and Skalski(1999), but rather Pyraloidea are the sister of the bulk of ‘classicalmacro moths’ (Macroheterocera + Mimallonidae). Papilionoideahave probably been the subject of more phylogenetic studies thanalmost any comparable group of hexapods (e.g. Simonsen et al.,2012), and are similarly well represented in our analyses. In thepresent study, we were unable to include Hedylidae, Calliduloideaand Hyblaeoidea, and thus to test close relationships with Papilio-noidea as suggested by Regier et al. (2009), Mutanen et al. (2010),and Heikkila et al. (2012). However, our results otherwise confirmthe overall results in these studies in as much as Hesperiidae are

176 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178

more closely related to the clade comprising Pieridae, Lycaenidaeand Nymphalidae, than any of these are to Papilionidae. The lastthree studies had Thyrididae as closely related to Papilionoidea.Here, however, Thyrididae are placed in the basal obtectomeranpolytomy as in Bazinet et al. (2013), but that study did not samplePapilionoidea, Hedylidae, Calliduloidea or Hyblaeoidea. WithinPapilionidae, the subfamilies Parnassinae and Papilioninae arerecovered as monophyletic in accordance with previous molecularand morphological studies (e.g. Miller, 1987; Nazari et al., 2007;Simonsen et al., 2011), but contrary to Simonsen et al. (2011),Teinopalpini (represented only by Teinopalpus) and not Troidini(represented only by Troides) are the sister of Papilio; this mayhowever reflect problematic public gene data (see above). Withinthe family Nymphalidae, the most remarkable result is that milk-weed butterflies (Danainae) are placed as the sister to the remain-der of the family. This is in conflict with both Heikkila et al. (2012)where Danainae together with Libytheinae form the sister group ofthe rest of the family, and the two most comprehensive studiesfocused on Nymphalidae: Freitas and Brown (2004: morphology),and Wahlberg et al. (2009: molecules), which both have Libythei-nae as the sister of the remaining Nymphalidae. However, ourresults in this respect are identical to the mt-genomic results byLu et al. (2013), and as noted by Simonsen et al. (2012), establish-ing which taxon is the sister group of the remaining Nymphalidaeseems to be one of the major challenges in butterfly systematics.Within Nymphalidae, the sister group relationship between Helic-oniinae and Limenitidinae, and the close relationship betweenNymphalinae and Apaturinae respectively, are in agreement withprevious molecular studies (i.e. Brower, 2000; Wahlberg et al.,2003, 2009; Wahlberg and Wheat, 2008), but not with the morpho-logical study by Freitas and Brown (2004).

While our sampling of Pyraloidea comprises only five of the 10subfamilies in Crambidae and one of the five subfamilies of Pyral-idae (and none of the three unplaced subfamilies) that are recogni-sed by Regier et al. (2012b), our results do confirm the monophylyof Crambidae (albeit with only one Pyralidae subfamily to test itagainst), and the sister group relationship between Spilomelinaeand Pyraustinae found by Regier et al. (2012b). But in contrast tothat study, we find that Acentropinae, and not Crambinae, appearsto be the sister group of Scopariinae. However, given our relativelylimited taxon sampling, we do not necessarily consider this resultconclusive.

The last area where our taxon sampling is sufficient for us tomake significant comparisons is the Macroheterocera (sensu vanNieukerken et al., 2011). The most problematic higher taxa to placewithin Macroheterocera have generally been the small, enigmaticfamilies Mimallonidae and Doidae. Based on morphology, Mimal-lonidae have been placed in Bombycoidea (Kristensen andSkalski, 1999; Lemaire and Minet, 1999), but molecular analyseshave continuously failed to support this placement: while Regieret al. (2009, 2013) consistently placed Mimallonidae in a cladewith Drepanoidea, Doidae and Cimeliidae (not sampled here).Mutanen et al. (2010) recovered the family either as the sistergroup of Gelechioidea (their Fig. 2, albeit with negligible posteriorsupport), or as sister to Macroheterocera (their Fig. 1 and their Sup-plementary Fig. 1), with poor (60%) bootstrap support. We do notfind any support for including Mimallonidae in either Bombycoi-dea or in a clade including Drepanidae and Doidae. Our degenand nc123 analyses rather unconventionally recover Mimallonoi-dea as sister to Lasiocampoidea whereas the RY analyses more con-servatively recover Mimallonoidea as sister to the remainingMacroheterocera. All our analyses thus strongly support a closerelationship between Mimallonoidea and Macroheterocera. TheRY result is also in agreement with Mutanen et al. (2010; their Sup-plementary Fig. 1) and with the 741-gene study of Bazinet et al.(2013), who find Mimallonidae sister to the main five macrohete-

roceran superfamilies with strong support. A position within Mac-roheterocera would be in line with morphological views, while thetopology of the RY-coded dataset agrees with Fig. 2 of Bazinet et al.(2013) in which it is sister to it. The second contentious taxon, Doi-dae, has variously been placed in Noctuoidea based on morphology(Kitching and Rawlins, 1999), or as mentioned above in a cladewith Mimallonidae and Drepanoidea based on molecular data(Regier et al., 2009, 2013; Mutanen et al., 2010 did not includeDoidae), or within the new concept of Drepanoidea, as sister toDrepanidae (Bazinet et al., 2013). Here our results agree withRegier et al. (2009, 2013) and Bazinet et al. (2013) by placingDoidae as the sister of Drepanidae with strong support, for bothRY and degen analyses. The support in RY trees for a clade consist-ing of Bombycidae + Sphingidae rather than the more convention-ally illustrated Sphingidae + Saturniidae grouping among the "SBS"group of Zwick et al. (2011) also deserves highlighting. We notethat prior studies have not mostly shown a strongly supported res-olution within this clade, but see Breinholt and Kawakara (2013).

Finally and relevant to the deep phylogeny of the Lepidoptera,we found that the tRNA rearrangement previously reported for Dit-rysia is present in Tineoidea, Tischerioidea and Palaephatoidea butnot in the monotrysian Adeloidea (Fig. 5). This synapomorphytherefore supports the mitochondrial topologies presented hereand other recent morphological and molecular studies (Mutanenet al., 2010; Regier et al., 2013), which showed Tischeriidae (andPalaephatidae) to be more closely related to Ditrysia than toAdeloidea.

We emphasise the efficiency of our approach in obtaining thesemitochondrial genomes and phylogenetic results. We applied acheap, next generation sequence (Illumina Solexa) based approach.Even though we aimed for only �1% of a MiSeq (Reagent Kit v2)run capacity, sequence output was higher than needed with amean mitochondrial genome coverage of �300x. Extrapolatingthese results suggests that more than 3000 full mitochondrialgenomes can be obtained in a single sequence run. Here weused relatively expensive ‘Sanger bait sequences’ to associatemitochondrial genome contigs to specimens. To further reducecosts, one could rely on the comprehensive BOLD barcode databaseand replace custom Sanger baits with publicly available COIsequences.

5. Conclusion

Overall, our study reveals the promise of mitochondrial gen-omes for deep level phylogenetics of Lepidoptera, but also its lim-itations. As with previous studies, we fail to resolve conclusivelythe Apoditrysia. This is unsurprising due to limited taxon sampling,and it could well be that even adding a single exemplar represen-tation of each remaining superfamily would resolve their relation-ships. We find it encouraging that we can, independently ofnuclear genome information, recover the expanded concepts ofclades as deep as Macroheterocera and Obtectomera, and alsothe Apoditrysia, with representatives of a limited set of superfam-ilies. Moreover, the method we present here paves the way for aphylogenetically comprehensive data sampling of lepidopteranmitochondrial genomes including most of the�133 families. Whileanalyses of such a dataset on its own would surely provide consid-erable new insights into lepidopteran evolution, combining it withalready existing nuclear datasets could finally result in a strongphylogenetic signal from different loci and gene histories.

Acknowledgments

We are very grateful to Charles W. Mitter and Kim T. Mitter,Department of Entomology, University of Maryland, USA for

M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178 177

donation of crucial specimens for sequencing from the LepTree tis-sue collection, and Edward (Ted) Edwards, Australian NationalInsect Collection for loan of Australian Palaephatidae for the anal-yses. Library preparation and sequencing was performed at theDepartment of Biochemistry (University of Cambridge). This workis supported by the Biodiversity Initiative of the NHM London.DCL’s sampling was supported by NSF #8316-07 and ERC grantEMARES #250325. MJTNT is a NERC Postdoctoral Fellow (NE/I021578/1). The authors thank M. Djernæs, I.J. Kitching and A.P.Vogler from NHM London for discussions and comments. Thisstudy used the computational infrastructure of the BioinformaticsSupport Service (Imperial College London; maintained by JamesAbbott) and of the Department of Life Sciences (NHM London;maintained by Peter Foster).

Appendix A. Supplementary material

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.ympev.2014.05.031.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic localalignment search tool. J. Mol. Biol. 215, 403–410.

Bazinet, A.L., Cummings, M.P., Mitter, K.T., Mitter, C.W., 2013. Can RNA-Seq resolvethe rapid radiation of advanced moths and butterflies (Hexapoda Lepidoptera:Apoditrysia)? An exploratory study. PLoS ONE 8, e82615.

Bininda-Emonds, O.R.P., 2005. TransAlign: using amino acids to facilitate themultiple alignment of protein-coding DNA sequences. BMC Bioinformatics 6,156.

Breinholt, J.W., Kawahara, A.Y., 2013. Phylotranscriptomics: saturated third codonpositions radically influence the estimation of trees based on next-gen data.Genome Biol. Evol. 5, 2082–2092.

Brower, A.V.Z., 2000. Phylogenetic relationships among the Nymphalidae(Lepidoptera) inferred from partial sequences of the wingless gene. Proc. R.Soc. Biol. Sci. Ser. B 267, 1201–1211.

Cao, Y.-Q., Ma, C., Chen, J.-Y., Yang, D.-R., 2012. The complete mitochondrialgenomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis:the ancestral gene arrangement in Lepidoptera. BMC Genomics 13, 276.

Castresana, J., 2000. Selection of conserved blocks from multiple alignments fortheir use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552.

Cho, S., Zwick, A., Regier, J.C., Mitter, C., Cummings, M.P., Yao, J., Du, Z., Zhao, H.,Kawahara, A.Y., Weller, S., Davis, D.R., Baixeras, J., Brown, J.W., Parr, C., 2011.Can deliberately incomplete gene sample augmentation improve a phylogenyestimate for the advanced moths and butterflies (Hexapoda: Lepidoptera)? Syst.Biol. 60, 782–796.

Eddy, S.R., Durbin, R., 1994. RNA sequence analysis using covariance models.Nucleic Acids Res. 22, 2079–2088.

Freitas, A.V.L., Brown, K.S., 2004. Phylogeny of the nymphalidae (Lepidoptera). Syst.Biol. 53, 363–383.

Grimaldi, D., Engel, M.S., 2005. Evolution of the Insects. Cambridge University Press,New York.

Guindon, S., Gascuel, O., 2003. A simple, fast, and accurate algorithm to estimatelarge phylogenies by maximum likelihood. Syst. Biol. 52, 696–704.

Haran, J., Timmermans, M.J.T.N., Vogler, A.P., 2013. Mitogenome sequences stabilizethe phylogenetics of weevils (Curculionoidea) and establish the monophyly oflarval ectophagy. Mol. Phylogenet. Evol. 67, 156–166.

Hassanin, A., 2006. Phylogeny of Arthropoda inferred from mitochondrial sequencesstrategies for limiting the misleading effects of multiple changes in pattern andrates of substitution. Mol. Phylogenet. Evol. 38, 100–116.

Heikkila, M., Kaila, L., Mutanen, M., Pena, C., Wahlberg, N., 2012. Cretaceous originand repeated tertiary diversification of the redefined butterflies. Proc. R. Soc.Biol. Sci. Ser. B 279, 1093–1099.

Kaila, L., Mutanen, M., Nyman, T., 2011. Phylogeny of the mega-diverse Gelechioidea(Lepidoptera): adaptations and determinants of success. Mol. Phylogenet. Evol.61, 801–809.

Karsholt, O., Mutanen, M., Lee, S., Kaila, L., 2013. A molecular analysis of theGelechiidae (Lepidoptera, Gelechioidea) with an interpretative grouping of itstaxa. Syst. Entomol. 38, 334–348.

Katoh, K., Kuma, K., Toh, H., Miyata, T., 2005. MAFFT version 5: improvement inaccuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518.

Katoh, K., Asimenos, G., Toh, H., 2009. Multiple alignment of DNA sequences withMAFFT. Methods Mol. Biol. 537, 39–64.

Kim, M.J., Kang, A.R., Jeong, H.C., Kim, K.-G., Kim, I., 2011. Reconstructingintraordinal relationships in Lepidoptera using mitochondrial genome datawith the description of two newly sequenced lycaenids, Spindasis takanonis andProtantigius superans (Lepidoptera: Lycaenidae). Mol. Phylogenet. Evol. 61, 436–445.

Kitching, I.J., Rawlins, J.E., 1999. The Noctuoidea. In: Kristensen, N.P. (Ed.),Handbook of Zoology, Volume 1: Evolution, Systematics, and Biogeography.Walter de Gruyter, Berlin, New York, pp. 355–401.

Kristensen, N.P., Skalski, A.W., 1999. Phylogeny and palaeontology. In: Kristensen,N.P. (Ed.), Handbook of Zoology, Volume 1: Evolution, Systematics, andBiogeography. Walter de Gruyter, Berlin, New York, pp. 7–25.

Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S., 2012. PartitionFinder:combinedselection of partitioning schemes and substitution models forphylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701.

Lemaire, C., Minet, J., 1999. The Bombycoidea and their relatives. In: Kristensen, N.P.(Ed.), Handbook of Zoology, Volume 1: Evolution, Systematics, andBiogeography. Walter de Gruyter, Berlin, New York, pp. 321–353.

Lohse, M., Bolger, A.M., Nagel, A., Fernie, A.R., Lunn, J.E., Stitt, M., Usadel, B., 2012.RobiNA: a user-friendly, integrated software solution for RNA-Seq- basedtranscriptomics. Nucleic Acids Res. 40, W622–W627.

Lu, H.-F., Su, T.-J., Luo, A.R., Zhu, C.-D., Wu, C.-S., 2013. Characterization of thecomplete mitochondrion genome of diurnal moth Amata emma (Butler)(Lepidoptera: Erebidae) and its phylogenetic implications. PLoS ONE 8,e72410.

Miller, J.S., 1987. Phylogenetic studies in the Papilioninae (Lepidoptera,Papilionidae). Bull. Am. Mus. Nat. Hist. 186, 365–512.

Miller, M.A., Pfeiffer, W., Schwartz, T., 2010. Creating the CIPRES Science Gatewayfor inference of large phylogenetic trees. In: Proceedings of the GatewayComputing Environments Workshop (GCE), 14 Nov. 2010, New Orleans, LA, pp.2011–2018.

Misof, B., Misof, K., 2009. A Monte Carlo approach successfully identifiesrandomness in multiple sequence alignments: a more objective means ofdata exclusion. Syst. Biol. 58, 21–34.

Mutanen, M., Wahlberg, N., Kaila, L., 2010. Comprehensive gene and taxon coverageelucidates radiation patterns in moths and butterflies. Proc. R. Soc. Biol. Sci. Ser.B 277, 2839–2848.

Nazari, V., Zakharov, E.V., Sperling, F.A.H., 2007. Phylogeny, historical biogeography,and taxonomic ranking of Parnassiinae (Lepidoptera, Papilionidae) based onmorphology and seven genes. Mol. Phylogenet. Evol. 42, 131–156.

Pons, J., Ribera, I., Bertranpetit, J., Balke, M., 2010. Nucleotide substitution rates forthe full set of mitochondrial protein-coding genes in Coleoptera. Mol.Phylogenet. Evol. 56, 796–807.

Regier, J.C., Zwick, A., Cummings, M.P., Kawahara, A.Y., Cho, S., Weller, S., Roe, A.,Baixeras, J., Brown, J.W., Parr, C., Davis, D.R., Epstein, M., Hallwachs, W.,Hausmann, A., Janzen, D.H., Kitching, I.J., Solis, M.A., Yen, S.-H., Bazinet, A.L.,Mitter, C., 2009. Toward reconstructing the evolution of advanced moths andbutterflies (Lepidoptera: Ditrysia): an initial molecular study. BMC Evol. Biol. 9,280.

Regier, J.C., Shultz, J.W., Zwick, A., Hussey, A., Ball, B., Wetzer, R., Martin, J.W.,Cunningham, C.W., 2010. Arthropod relationships revealed by phylogenomicanalysis of nuclear protein-coding sequences. Nature 463, 1079–1083.

Regier, J.C., Brown, J.W., Mitter, C., Baixeras, J., Cho, S., Cummings, M.P., Zwick, A.,2012a. A molecular phylogeny for the leaf-roller moths (Lepidoptera:Tortricidae) and its implications for classification and life history evolution.PLoS ONE 7, e35574.

Regier, J.C., Mitter, C., Solis, M.A., Hayden, J.E., Landry, B., Nuss, M., Simonsen, T.J.,Yen, S.-H., Zwick, A., Cummings, M.P., 2012b. A molecular phylogeny for thepyraloid moths (Lepidoptera: Pyraloidea) and its implications for higher-levelclassification. Syst. Entomol. 37, 635–656.

Regier, J.C., Mitter, C., Zwick, A., Bazinet, A.L., Cummings, M.P., Kawahara, A.Y., Sohn,J.-C., Zwickl, D.J., Cho, S., Davis, D.R., Baixeras, J., Brown, J., Parr, C., Weller, S.,Lees, D.C., Mitter, K.T., 2013. A large-scale, higher-level, molecular phylogeneticstudy of the insect order Lepidoptera (moths and butterflies). PLoS ONE 8,e58568.

Ronquist, F., Huelsenbeck, J.P., Teslenko, M., 2011. Draft MrBayes version 3.2Manual. Tutorials and Model Summaries. <http://mrbayes.sourceforge.net/download.php>.

Schmieder, R., Edwards, R., 2011. Quality control and preprocessing of metagenomicdatasets. Bioinformatics 27, 863–864.

Simon, C., Frati, F., Beckenback, A., Crespi, B., Liu, H., Flook, P., 1994. Evolution,weighting, and phylogenetic utility of mitochondrial gene sequences and acompilation of conserved polymerase chain reaction. Ann. Ent. Soc. Am. 87,651–701.

Simon, S., Hadrys, H., 2013. A comparative analysis of complete mitochondrialgenomes among Hexapoda. Mol. Phylogenet. Evol. 69, 393–403.

Simonsen, T.J., Zakharov, E.V., Djernaes, M., Cotton, A.M., Vane-Wright, R.I., Sperling,F.A.H., 2011. Phylogenetics and divergence times of Papilioninae (Lepidoptera)with special reference to the enigmatic genera Teinopalpus and Meandrusa.Cladistics 27, 113–137.

Simonsen, T.J., de Jong, R., Heikkila, M., Kaila, L., 2012. Butterfly morphology in amolecular age – does it still matter in butterfly systematics? Arthropod Struct.Dev. 41, 307–322.

Sohn, J.-C., Regier, J.C., Mitter, C., Davis, D., Landry, J.-F., Zwick, A., Cummings, M.P.,2013. A molecular phylogeny for Yponomeutoidea (Insecta, Lepidoptera,Ditrysia) and its implications for classification, biogeography and theevolution of host plant use. PLoS ONE 8, e5506.

Stamatakis, A., Ludwig, T., Meier, H., 2005. RAxML-III: a fast program for maximumlikelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–463.

Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving thesensitivity of progressive multiple sequence alignment through sequence

178 M.J.T.N. Timmermans et al. / Molecular Phylogenetics and Evolution 79 (2014) 169–178

weighting, position-specific gap penalties and weight matrix choice. NucleicAcids Res. 22, 4673–4680.

Timmermans, M.J.T.N., Vogler, A.P., 2012. Phylogenetically informativerearrangements in mitochondrial genomes of Coleoptera, and monophyly ofaquatic elateriform beetles (Dryopoidea). Mol. Phylogenet. Evol. 63, 299–304.

Timmermans, M.J.T.N., Dodsworth, S., Culverwell, C.L., Bocak, L., Ahrens, D.,Littlewood, D.T.J., Pons, J., Vogler, A.P., 2010. Why barcode? High-throughputmultiplex sequencing of mitochondrial genomes for molecular systematics.Nucleic Acids Res. 38, e197.

van Nieukerken, E.J., Kaila, L., Kitching, I.J., Kristensen, N.P., Lees, D.C., Minet, J.,Mitter, C., Mutanen, M., Regier, J.C., Simonsen, T.J., Wahlberg, N., Yen, S.-H.,Zahiri, R., Adamski, D., Baixeras, J., Bartsch, D., Bengtsson, B.A., Brown, J.W.,Bucheli, S.R., Davis, D.R., de Prins, J., de Prins, W., Epstein, M.E., Gentili-Poole, P.,Gielis, C., Haettenschwiler, P., Hausmann, A., Holloway, J.D., Kallies, A., Karsholt,O., Kawahara, A.Y., Koster, S., Kozlov, M.V., Lafontaine, J.D., Lamas, G., Landry, J.-F., Lee, S., Nuss, M., Park, K.-T., Penz, C., Rota, J., Schintlmeister, A., Schmidt, B.C.,Sohn, J.-C., Solis, M.A., Tarmann, G.M., Warren, A.D., Weller, S., Yakovlev, R.V.,Zolotuhin, V.V., Zwick, A., 2011. Order Lepidoptera Linnaeus, 1758. Zootaxa3148, 212–221.

Wahlberg, N., Wheat, C.W., 2008. Genomic outposts serve the phylogenomicpioneers: designing novel nuclear markers for genomic DNA extractions oflepidoptera. Syst. Biol. 57, 231–242.

Wahlberg, N., Weingartner, E., Nylin, S., 2003. Towards a better understanding ofthe higher systematics of Nymphalidae (Lepidoptera: Papilionoidea). Mol.Phylogenet. Evol. 28, 473–484.

Wahlberg, N., Leneveu, J., Kodandaramaiah, U., Pena, C., Nylin, S., Freitas, A.V.L.,Brower, A.V.Z., 2009. Nymphalid butterflies diversify following near demise atthe Cretaceous/Tertiary boundary. Proc. R. Soc. Biol. Sci. Ser. B 276, 4295–4302.

Wernersson, R., 2005. FeatureExtract – extraction of sequence annotation madeeasy. Nucleic Acids Res. 33, W567–W569.

Xia, X., Xie, Z., 2001. DAMBE: software package for data analysis in molecularbiology and evolution. J. Hered. 92, 371–373.

Zahiri, R., Kitching, I.J., Lafontaine, J.D., Mutanen, M., Kaila, L., Holloway, J.D.,Wahlberg, N., 2011. A new molecular phylogeny offers hope for a stable familylevel classification of the Noctuoidea (Lepidoptera). Zool. Scr. 40, 158–173.

Zahiri, R., Holloway, J.D., Kitching, I.J., Lafontaine, J.D., Mutanen, M., Wahlberg, N.,2012. Molecular phylogenetics of Erebidae (Lepidoptera, Noctuoidea). Syst.Entomol. 37, 102–124.

Zahiri, R., Lafontaine, D., Schmidt, C., Holloway, J.D., Kitching, I.J., Mutanen, M.,Wahlberg, N., 2013a. Relationships among the basal lineages of Noctuidae(Lepidoptera, Noctuoidea) based on eight gene regions. Zool. Scr. 42, 488–507.

Zahiri, R., Lafontaine, J.D., Holloway, J.D., Kitching, I.J., Schmidt, B.C., Kaila, L.,Wahlberg, N., 2013b. Major lineages of Nolidae (Lepidoptera, Noctuoidea)elucidated by molecular phylogenetics. Cladistics 29, 337–359.

Zwick, A., Regier, J.C., Mitter, C., Cummings, M.P., 2011. Increased gene samplingyields robust support for higher-level clades within Bombycoidea(Lepidoptera). Syst. Entomol. 36, 31–43.

Zwick, A., Regier, J.C., Zwickl, D.J., 2012. Resolving discrepancy between nucleotidesand amino acids in deep-level arthropod phylogenomics: differentiating serinecodons in 21-amino-acid models. PLoS ONE 7, e47450.