high-throughput analysis of satellite dna in the grasshopper … · 2018. 3. 19. · original...

30
ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica reveals abundance of homologous and heterologous higher-order repeats Francisco J. Ruiz-Ruano 1 & Jesús Castillo-Martínez 1,3 & Josefa Cabrero 1 & Ricardo Gómez 2 & Juan Pedro M. Camacho 1 & María Dolores López-León 1 Received: 7 September 2017 /Revised: 13 February 2018 /Accepted: 6 March 2018 # Springer-Verlag GmbH Germany, part of Springer Nature 2018 Abstract Satellite DNA (satDNA) constitutes an important fraction of repetitive DNA in eukaryotic genomes, but it is barely known in most species. The high-throughput analysis of satDNA in the grasshopper Pyrgomorpha conica revealed 87 satDNA variants grouped into 76 different families, representing 9.4% of the genome. Fluorescent in situ hybridization (FISH) analysis of the 38 most abundant satDNA families revealed four different patterns of chromosome distribution. Homology search between the 76 satDNA families showed the existence of 15 superfamilies, each including two or more families, with the most abundant superfamily representing more than 80% of all satDNA found in this species. This also revealed the presence of two types of higher-order repeats (HORs), one showing internal homologous subrepeats, as conventional HORs, and an additional type showing non-homologous internal subrepeats, the latter arising by the combination of a given satDNA family with a non- annotated sequence, or with telomeric DNA. Interestingly, the heterologous subrepeats included in these HORs showed higher divergence within the HOR than outside it, suggesting that heterologous HORs show poor homogenization, in high contrast with conventional (homologous) HORs. Finally, heterologous HORs can show high differences in divergence between their constit- uent subrepeats, suggesting the possibility of regional homogenization. Keywords FISH . Higher-order repeat (HOR) . High-throughput sequencing . Pyrgomorpha conica . Satellitome Introduction A substantial part of eukaryotic genomes is composed of dif- ferent repeated sequences (Britten and Kohne 1968; López- Flores and Garrido-Ramos 2012; Garrido-Ramos 2017). Among them, satellite DNA (satDNA) is considered one of the most abundant repeated sequences and constitutes a major component of heterochromatin in numerous species of plant and animals (Charlesworth et al. 1994). In eukaryotes, these sequences can represent up to half of its genome content (Plohl et al. 2012). SatDNA consists in a non-genic repeat unit (RU) of a given length which appears tandemly repeated and organized in ar- rays of variable length and complexity. They can be classified into microsatellites, minisatellites and satellites according to RU length (RUL), although there is no agreement on the length thresholds to delimit these three subclasses. A frequent convention is 16, 7100 and > 100 bp for micro-, mini- and satellites, respectively. In addition, micro- and minisatellites are frequently considered to be rarely locally amplified thus failing to show bands visible by FISH on chromosomes, thus being considered as genomic elements being scattered across the genome. However, satellites (sometimes called macrosatellites, see Kass and Batzer 2001) are usually consid- ered to be locally amplified thus showing visible bands on chromosomes. These conventions have recently been Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00412-018-0666-9) contains supplementary material, which is available to authorized users. * María Dolores López-León [email protected] 1 Departamento de Genética. Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain 2 Departamento de Ciencia y Tecnología Agroforestal, E.T.S. de Ingenieros Agrónomos, Universidad de Castilla La Mancha, 02071 Albacete, Spain 3 Present address: Facultad de Medicina, Universidad Católica de Valencia, C/Quevedo 2, 46001 Valencia, Spain Chromosoma https://doi.org/10.1007/s00412-018-0666-9

Upload: others

Post on 20-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

ORIGINAL ARTICLE

High-throughput analysis of satellite DNA in the grasshopperPyrgomorpha conica reveals abundance of homologousand heterologous higher-order repeats

Francisco J. Ruiz-Ruano1& Jesús Castillo-Martínez1,3 & Josefa Cabrero1

& Ricardo Gómez2 &

Juan Pedro M. Camacho1& María Dolores López-León1

Received: 7 September 2017 /Revised: 13 February 2018 /Accepted: 6 March 2018# Springer-Verlag GmbH Germany, part of Springer Nature 2018

AbstractSatellite DNA (satDNA) constitutes an important fraction of repetitive DNA in eukaryotic genomes, but it is barely known inmost species. The high-throughput analysis of satDNA in the grasshopper Pyrgomorpha conica revealed 87 satDNA variantsgrouped into 76 different families, representing 9.4% of the genome. Fluorescent in situ hybridization (FISH) analysis of the 38most abundant satDNA families revealed four different patterns of chromosome distribution. Homology search between the 76satDNA families showed the existence of 15 superfamilies, each including two or more families, with the most abundantsuperfamily representing more than 80% of all satDNA found in this species. This also revealed the presence of two types ofhigher-order repeats (HORs), one showing internal homologous subrepeats, as conventional HORs, and an additional typeshowing non-homologous internal subrepeats, the latter arising by the combination of a given satDNA family with a non-annotated sequence, or with telomeric DNA. Interestingly, the heterologous subrepeats included in these HORs showed higherdivergence within the HOR than outside it, suggesting that heterologous HORs show poor homogenization, in high contrast withconventional (homologous) HORs. Finally, heterologous HORs can show high differences in divergence between their constit-uent subrepeats, suggesting the possibility of regional homogenization.

Keywords FISH . Higher-order repeat (HOR) . High-throughput sequencing . Pyrgomorpha conica . Satellitome

Introduction

A substantial part of eukaryotic genomes is composed of dif-ferent repeated sequences (Britten and Kohne 1968; López-Flores and Garrido-Ramos 2012; Garrido-Ramos 2017).Among them, satellite DNA (satDNA) is considered one of

the most abundant repeated sequences and constitutes a majorcomponent of heterochromatin in numerous species of plantand animals (Charlesworth et al. 1994). In eukaryotes, thesesequences can represent up to half of its genome content(Plohl et al. 2012).

SatDNA consists in a non-genic repeat unit (RU) of a givenlength which appears tandemly repeated and organized in ar-rays of variable length and complexity. They can be classifiedinto microsatellites, minisatellites and satellites according toRU length (RUL), although there is no agreement on thelength thresholds to delimit these three subclasses. A frequentconvention is 1–6, 7–100 and > 100 bp for micro-, mini- andsatellites, respectively. In addition, micro- and minisatellitesare frequently considered to be rarely locally amplified thusfailing to show bands visible by FISH on chromosomes, thusbeing considered as genomic elements being scattered acrossthe genome. However, satellites (sometimes calledmacrosatellites, see Kass and Batzer 2001) are usually consid-ered to be locally amplified thus showing visible bands onchromosomes. These conventions have recently been

Electronic supplementary material The online version of this article(https://doi.org/10.1007/s00412-018-0666-9) contains supplementarymaterial, which is available to authorized users.

* María Dolores López-Leó[email protected]

1 Departamento de Genética. Facultad de Ciencias, Universidad deGranada, 18071 Granada, Spain

2 Departamento de Ciencia y Tecnología Agroforestal, E.T.S. deIngenieros Agrónomos, Universidad de Castilla La Mancha,02071 Albacete, Spain

3 Present address: Facultad de Medicina, Universidad Católica deValencia, C/Quevedo 2, 46001 Valencia, Spain

Chromosomahttps://doi.org/10.1007/s00412-018-0666-9

Page 2: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

overturned after the high-throughput analysis of satDNA cat-alog (i.e., the satellitome) in the grasshoppers Locustamigratoria (Ruiz-Ruano et al. 2016) and Eumigus monticola(Ruiz-Ruano et al. 2017), since satDNAs showing both longor short RUs can be visible or not by FISH. In fact, the exis-tence of satDNAs showing short RUs, fitting the conventionaldefinition of micro- and minisatellites, but displaying conspic-uous bands after in situ hybridization, had previously beenreported in other organisms, such as Musca domestica(Blanchetot 1991), Drosophila melanogaster (Bonaccorsiand Lohe 1991) or grasshoppers (Ruiz-Ruano et al. 2015).Recent findings have also shown that satDNA is not restrictedto heterochromatin (Ruiz-Ruano et al. 2016, 2017). Therefore,the most inclusive definition of satDNA would be one com-prising only universal properties of satDNA, such as its tan-dem repeat structure and its non-genic character (to differen-tiate it from tandem repeat gene families).

Traditionally, satDNA has been identified and isolatedthrough restriction endonuclease digestion of genomic DNA,monomer cloning and Sanger sequencing. However, this ap-proach limits the number of satDNA families that can be char-acterized to those most represented in the genome, whereasmany low-abundance satDNAs go unnoticed. Nowadays, thearrival and improvement of next generation sequencing(NGS) technologies (van Dijk et al. 2014) has provided anopportunity for cloning-free massive sequencing of genomeswhich has yielded huge volumes of genomic data on differentspecies, including the repeat fraction of genomes that previ-ously appeared underrepresented in the genome sequencingprojects. In fact, satDNA is scarcely mentioned in works de-scribing whole genome sequences. In human, for instance, thelongest assembly gaps correspond to satDNA-rich centromereregions and the short arm of acrocentric chromosomes (forreview, see Miga 2015). However, studies on repetitiveDNA, and particularly on satDNA, are of high interest tocompletely understand genome structure and dynamics.Additionally, during the last three decades, the number ofevidences pointing to a functional role of these sequenceshas steadily increased, and satDNA has been found to beinvolved in the establishment of heterochromatic domains(Henikoff 1998; Hsieh and Fire 2000), centromere function(Plohl et al. 2014), chromosome paring and segregation (Licaet al. 1986; John 1988), nuclear architecture (Hemleben et al.2000) or gene expression (Feliciello et al. 2015). As a conse-quence, NGS and bioinformatic graph-based methods (Nováket al. 2013) are now preferentially used to identify and char-acterize the repetitive DNA fraction of genomes in a variety ofanimal and plant species (Macas et al. 2007, 2011; Nováket al. 2014; García et al. 2015).

In insects, satDNA has been identified by conventionalmethods in a limited number of species (for review, seePalomeque and Lorite 2008). However, the recent high-throughput analysis of satDNA content based on NGS reads

data in species such as the grasshoppers Schistocerca gregaria(Camacho et al. 2015a), L. migratoria (Ruiz-Ruano et al.2016) and E. monticola (Ruiz-Ruano et al. 2017) has enabledthe detection of satDNAs that remained elusive by traditionalmethods. This has facilitated the global analysis of the wholesatDNA catalog, i.e., the satellitome, which, in the case ofL. migratoria, unveiled 62 satDNA families and providednew insights about the origin and evolution of these intriguingsequences.

Chromosome location of satDNA is prevalent on centro-meric and telomeric regions (Charlesworth et al. 1994) incoincidence with heterochromatin location, although it canalso be found on interstitial euchromatic regions (Ruiz-Ruano et al. 2016). Only a few satDNAs are located on allchromosomes within a genome, whereas other arechromosome-specific (Plohl et al. 1998; Ruiz-Ruano et al.2016, 2017). In several insect species, satDNA being specificof X or B chromosomes have also been found (for review, seePalomeque and Lorite 2008; López-Flores and Garrido-Ramos 2012; Garrido-Ramos 2017; Ruiz-Ruano et al. 2017).

The grasshopper Pyrgomorpha conica is a non-model spe-cies with very limited molecular infomation on its chromo-somes. Classical cytogenetic studies have described its karyo-type as composed of nine pairs of autosomes that can be clas-sified into large (L1-L3), medium (M4-M8) and small (S9)sized chromosomes, plus one or two X chromosomes in malesor females respectively. All chromosomes are apparently telo-centric, and the S9 autosome behaves as the megameric biva-lent during meiosis, as it is differentially condensed through-out first profase (Antonio et al. 1993). The scarce knowledgeconcerning the molecular nature of chromosomes of this spe-cies is restricted to the presence of large heterochromatinblocks on pericentromeric regions of all chromosomes(Santos et al. 1983), the presence of sites sensitive to restric-tion endonucleases on them (López-Fernández et al. 1988),the physical mapping of ribosomal DNA (rDNA) on the M5and M6 chromosomes (Suja et al. 1993) and the identificationof interstitial telomeric DNA regions (ITRs) on many chro-mosomes (López-Fernández et al. 2004, 2006).

SatDNA units are sometimes simple, i.e., composed of asingle sequence with no internal substructuring. However,there are also complex satDNAs with units showing internalduplication of a subunit, whichmay be inverted or not, and theeventual insertion of foreign DNA (Charlesworth et al. 1994;Meštrović et al. 2015). SatDNA repeats sometimes combineinto higher-order repeats (HORs) composed of different vari-ants of a same satDNA family, which are better homogenizedthan the constituent subrepeats (Willard and Waye 1987;Warburton and Willard 1990; Plohl et al. 2012). In the beetleTribolium madens, however, satellite II is actually a complexHOR by including an insertion of an unrelated sequence ele-ment (Mravinac and Plohl 2007). Here we perform the high-throughput analysis of the satellitome in the grasshopper

Chromosoma

Page 3: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

P. conica, which has revealed the presence of two types ofHORs, i.e., the conventional type, where the subrepeats arehomologous, and also complex HORs including unrelatedDNA sequences. We suggest the terms Bhomologous^ forHORs built by satDNAs showing homology, andBheterologous^ for those including a mixture of non-homologous satDNAs.

Materials and methods

Materials, chromosome preparations and genomicDNA isolation

Adult males and females of the grasshopper speciesPyrgomorpha conicawere collected at Hoya de Gonzalo pop-ulation in Albacete province (Spain). Testes were dissectedout from anesthetized animals, fixed in freshly prepared 3:1etanol-acetic acid and stored at 4 °C for cytogenetic analysis.For genomic DNA (gDNA) isolation, adult whole bodiesmales and females were frozen by immersion in liquid nitro-gen and store at − 80 °C until use. Chromosome preparationswere made by squashing two testis tubules in a drop of 50%acetic acid, following the methods described in Camacho et al.2015b. We incubated each spermatocyte preparation in a pep-sin solution (50 μg/ml in 0.01 N HCl) for 2 min, in order tofacilitate probe accessibility for fluorescent in situ hybridiza-tion (FISH). After two washes in distilled water, slides weredehydrated in a series of 70%, 90% and 100% etanol for 3, 3and 5 min, respectively, and stored overnight at 60 °C, asdescribed in Camacho et al. 2015b. We extracted gDNA fromP. conicamale and female specimens for Illumina HiSeq2000platform genome sequencing and PCR experiments, using theGenEluteMamalian Genomic DNAMiniprep kit (Sigma) fol-lowing manufacturer’s recommendation. Quantity and qualityof gDNA obtained were evaluated with a Tecan’s Infinite 200NanoQuant and also in a 1% agarose gel electrophoresis.

Genomic sequencing and bioinformatic analysis

We sequenced gDNA from a P. conica female getting about5.4 Gb of 2 × 101 nt paired-end reads uploaded to the SRAdatabase (SRR3953136 accession number). We then appliedthe satMiner protocol (Ruiz-Ruano et al. 2016) to perform ahigh-throughput sequence characterization of its satellitomeby several rounds of clustering with RepeatExplorer (Nováket al. 2013) and filtering with DeconSeq (Schmieder andEdwards 2011). We searched for homology between consen-sus sequences using the RepeatMasker software (Smit et al.2013), and named and classified them into superfamilies, fam-ilies or variants according to Ruiz-Ruano et al. (2016). Weestimated abundance and divergence with RepeatMasker(Smit et al. 2013) on a random selection of 2 × 5 million reads,

and named the different satDNA families in order of decreas-ing abundance. Then we created repeat landscapes with theperl script calcDivergenceFromAlign.pl from theRepeatMasker suite. We compared monomers of differentsatDNA superfamilies or families with several variants usingthe Geneious v4.8 software (Drummond et al. 2009). Dot-plots for the monomers were also manually revised to searchfor inner subrepeats in the consensus monomers of thesatDNAs. Minimum spanning trees (MST) were built withArlequin v3.5 (Excoffier and Lischer 2010) and p-distancematrices were calculated with MEGA v5 (Tamura et al.2011). Assembled sequences were deposited in GenBankwithaccession numbers KX891234-KX891320.

By definition, a homologous HOR is constituted by therepetition of two or more subrepeats. In the simplest case,two subrepeats (namedA and B) would show, for instance, aA1-B1-A2-B2-A3-B3-... repetition pattern. The reliabilityof homologous HORs was tested by comparing sequencedifferences between contiguous (e.g., A1 vs B1, A2 vsB2) and alternate (e.g., A1 vs A2, and B1 vs B2) subrepeats,following Willard and Wayne (1987). For this purpose, weperformed two different kinds of analyses on 2 × 5 millionread-pairs randomly selected with seqtk software (https://github.com/lh3/seqtk). In both cases, we performedsequence alignment by RepeatMasker (Smit et al. 2013) tosearch for read-pairs showing similarity with a referencesequence constituted by a dimer of the satDNA familyforming the homologous HOR. Then we used fastq-join(https://github.com/ExpressionAnalysis/ea-utils) toassemble the two reads of a same pair, which yielded read-pair sequences of up to 190 nt. Given that the homologousHORs included subrepeats of about 46 nt, these 190 ntallowed to compare sequence differences between contigu-ous and alternate subrepeats. We then aligned the assembledread-pairs with a reference sequence constituted by threesubrepeats, using RepeatMasker. For each satDNA familyconstituting a homologous HOR, we used two different ref-erences for the A1-B1-A2 and B1-A2-B2 combinations ofthree subrepeats. From the output of this software, and usinga custom python script (https://github.com/fjruizruano/ngs-protocols/blob/master/rm_getseq.py), we extracted thesequence of all full-length aligned read-pairs. Then wealigned the resulting sequences with Geneious and, in orderto avoid confusion between the two homologous HORsfound (PcoSat30-92 and PcoSat62-80) due to the fact thatthey show similarity between themselves (i.e., both belongto SF11), we selected the read-pairs showing the highestsimilarity with each reference. We then separately trimmedthe alignment obtained for each reference into three differ-ent alignments, one for each subrepeat, and we used then tocalculate mean p-distances between them, using MEGA v5software (Tamura et al. 2011), and estimating the standarderror by means of 1000 replicates bootstrap (Fig. S1a).

Chromosoma

Page 4: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

To investigate whether the homologous HORs had a struc-tural origin based on tandem repeats of lower length than asubrepeat, we used the same read-pairs previously assembled(see above) and the Tandem Repeats Finder software (Benson1999), with the options B2 7 7 80 10 200 200^. The three firstparameters serve to calculate the score, i.e., the weight formatches, mismatches and indels. The fourth and fifth param-eters are match and indel probabilities, respectively, whichusually are fixed to 80 and 10. The minimum alignment scorewas 200 and the maximum period size was 200 nt. We thenperformed an additional filtering to search for those read-pairsshowing adjacent subrepeats with identity higher than 95%.This step was important to work with less degenerated ver-sions of the HOR, hoping that this would facilitate the findingof basic subrepeat units. Additionally, we relaxed the formerparameters to search for tandem repeats with even shortermonomers (B2 7 7 80 10 50 20^). In this case, we also reducedthe minimum identity between adjacent repeats to 80%. Theresulting consensus monomers were grouped by size range(about 10, 20, 30, etc.) and separately aligned with Geneiousto get a consensus sequence per range (Fig. S1a).

As a previous step in the analysis of heterologous HORs,we first analyzed insert size in our Illumina library, by assem-bling the P. conica mitogenome with MITObim (Hahn et al.2013), using the Mekongiella kingdoni mitogenome as refer-ence (HQ833842.1 accession number). We then performedread mappings against the P. conica mitogenome usingBowtie2 considering only read-pairs where both membersweremapped (–no-mixed option).We then scored the distancebetween both members of each read-pair and graphically rep-resented the values and calculated insert size, which was265 bp, on average (SD = 87) for a total of 29.966 mappedread-pairs.

To analyze the structure of heterologous HORs (i.e., thoseincluding non-homologous subrepeats), we selected read-pairs where both members showed homology with a 50 nttarget including 25 nt of each heterologous sequence withinthe HOR, using RepeatMasker. This strategy allowed analyz-ing the junctions between the two subrepeats included in theseheterologous HORs. We performed this analysis for junctionsat both 5′ and 3′ ends of each HOR, and mapped then against areference sequence consisting in the combination of at leasttwo HOR units. We carried out these mappings with Bowtie2(Langmead and Salzberg 2012) applying the B–no-mixed^option. Finally, we visualized the resulting mappings withIGV (Thorvaldsdóttir et al. 2013) (Fig. S1b).

To analyze possible associations of telomeric DNA withother satDNA families, we first selected read-pairs showinghomology with the telomeric sequence, using BLAT (Kent2002), and then performed a RepeatExplorer clustering usingRepBase and the P. conica satDNA collection, as custom da-tabase, to annotate the resulting clusters. In addition, we clas-sified the resulting read-pairs into pure (i.e., containing only

telomeric repeats) and mixed (i.e., containing telomeric andother satDNAs) by aligning them with RepeatMasker, andthen estimated sequence abundance and divergence in eachgroup separately.

SatDNA amplification by PCR

Monomer sequences for each of the 38 most abundantsatDNAs were independently aligned to get the consensussequence and establish the most conserved regions in orderto design PCR primers for each satDNA family, using thePrimer3 software (Untergasser et al. 2012) (Table S1). Weperformed PCR amplification with these satDNAs to assesstheir reliability, and generated specific probes for their chro-mosome mapping by in situ hybridization (FISH). PCR reac-tion mixture contained 0.2 mM dNTPs, 0.4 μM each primer(0.8 μM for sat 67), 1X Taq polymerase buffer, 2.5 mMMgCl2, 10 ng DNA sample and one unit of Horse-PowerTaq polymerase (Canvax) in a 25 μl total volume. The PCRprogram consisted of an initial denaturation step at 95 °C for5 min and 30 cycles at 94 °C for 20 s, with 60–64.4 °C an-nealing temperature for 30 s, and 72 °C for 15 s, and a finalextension step at 72 ºC for 7 min. For satDNAs showing unitlength shorter than 50 nt, we reduced annealing time to 10 s.The program was run in an Eppendorf Mastercycler epGradient. All PCR experiments included a control reaction(without DNA) to exclude the possibility of DNA contamina-tion. PCR products were visualized in a 2% electrophoresisagarose gel to check the presence of the ladder pattern expect-ed for tandem repeats. Themonomeric bandwas isolated fromagarose gel with a razor blader and extracted by squeezing it ina parafilm square and recovered DNAwas reamplified using0.5 μl of this DNA solution with the same PCR conditions.We cleaned all PCR products with GenElute PCR Clean-upkit (Sigma) and Sanger sequencing was performed byMacrogen Inc. to confirm the specificity of the amplifiedsatDNA sequences.

Fluorescent in situ hybridization

Physical mapping of satDNAwas performed by fluorescentin situ hybridization (FISH) following the protocol de-scribed in Camacho et al. 2015b. Three different DNAprobes were used: (a) monomers of 38 satDNAs identifiedin this work; (b) a fragment of 1113 bp of 18S ribosomalDNA (rDNA) obtained by PCR amplification using the18SE and 1100R primers (Littlewood and Olson 2001)and PCR conditions described in Ruiz-Estévez et al. (2014),and (c) the fluorescein-labeled synthetic oligonucletides(TAACC)×7 and (GGTTA)×7 as telomeric DNA probe(Meyne et al. 1995).

Probes were labeled using 2.5 units of DNA polymerase I/DNase I (Invitrogen) in a 50 μl nick translation reaction with

Chromosoma

Page 5: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

tetramethylrhodamine-5-dUTP (satDNAs) or withfluorescein-12-dUTP (rDNA) from Roche, using standardprotocols. Doble FISH was carried out combining two probeslabeled with different fluorochromes. Hence, we used rDNAprobe in double FISHwith those satDNAs located inmedium-sized chromosomes, as a marker to facilitate the identificationof the five medium-sized chromosome group (M4-M8) ofP. conica, since only M6 an M8 carried rDNA proximallylocated (close to centromere region) in the population studiedhere. The chromosomes carrying rDNA were identified, onthe basis of size, by measuring chromosome area using thepyFIA software (Ruiz-Ruano et al. 2011) on 10 DAPI-stainedmetaphase I cells which had also been submitted to FISH withthe rDNA probe.

Statistical analyses

Statistical analyses were performed by the non-parametricSpearman rank correlation, Mann-Whitney and Kruskal-Wallis tests, using Statistica software (StatSoft, Inc. 2005).Contingency chi-square tests were performed with the RXCsoftware (George Carmody, University of Ottawa, Canada) bya Monte Carlo approach to calculate statistical significance,with 20 batches of 2500 replicates. p-distances were comparedby Student t tests.

Results

Next generation sequencing and satDNAidentification

Applying the satMiner protocol (Ruiz-Ruano et al. 2016) tothe P. conica Illumina sequences, and after 8 rounds ofRepeatExplorer and Deconseq filtering, we found 87satDNA variants grouped into 76 families. They were namedin order of decreasing abundance, following Ruiz-Ruano et al.(2016), and the main features of the repeat unit, i.e., repeat unitlength (RUL), A + T content, abundance and divergence, aresummarized in Table 1 and Fig. 1. As a whole, the 76 satDNAfamilies represent 9.4% of the P. conica genome.

RUL ranged between 5 and 320 nt (median = 149 nt), withfour main gaps defining five groups of satDNAs: (1)PcoSat08-5-tel with only 5 nt, (2) four satDNAs with 35–38 nt RUL, (3) 15 satDNAs with 62–93 nt RUL, (4) 66satDNAs with 105–286 nt RUL, and (5) PcoSat25A-320 with320 nt RUL. The 3rd gap resembled the main gap found inL. migratoria by Ruiz-Ruano et al. (2016), although it is muchshorter than the 37 nt found in the latter species (between 90and 127 nt). To explore satellitome characteristics, and com-parison with L. migratoria, we divided satDNAs families intotwo groups based on monomer length: 19 short (< 100 nt) and57 long (> 99 nt).

A + T content of the 76 satDNA consensus monomersranged between 40 and 77.6%, with 60.1% median value.We did not found significant correlation between monomerA + Tcontent and length (Spearman rank correlation analysis:rS = 0.0004, t = 0.0035, P = 0.997).

SatDNA abundance ranged from 0.001 to 5.67% of thegenome, with more than three orders of magnitude betweenthe most and the least abundant satDNAs (median value =0.01%). Remarkably, the two most abundant satDNAs(PcoSat01-176 and PcoSat-02-156) represent more than80% of all satDNA found in this species, their abundancesbeing 5.67 and 2% of the genome, respectively. The nextsatDNA in abundance (PcoSat03-199) showed abundance(0.164%) being more than one order of magnitude lower thanthat of PcoSat02-156) (Table 1). In fact, only seven satDNAsshowed abundance higher than 0.1%, the figure for thetelomeric repeat (PcoSat08-5-tel), and only 11 satDNAssurpassed the 0.05% level. Therefore, the satellitome inP. conica consists of two extremely abundant satDNAs,nine being moderately abundant, and 65 being rare orvery rare. Abundance showed no significant correlationwith monomer length (rS = 0.16, t = 1.37, P = 0.17) orA + T content (rS = 0.15, t = 1.33, P = 0.19).

Divergence of the 76 satDNA families ranged between2.25 and 21.34%, median value being 7.7%. Remarkably,three satDNAs (PcoSat50-73, PcoSat02-156 and PcoSat39-165, with 2.25%, 2.51% and 2.61% divergences, respectively)showed lower divergence than the telomeric repeat (2.7%).Divergence showed no correlation with monomer length(rS = 0.15, t = 1.34, P = 0.18), A + T content (rS = 0.16, t =1.41, P = 0.16) or abundance (rS = 0.15, t = 1.30, P = 0.20).However, a comparison between the 19 short and 57 longsatDNAs showed that whereas they did not differ for A + Tcontent (Mann-Whitney test: U = 537, P = 0.96) and abun-dance (U = 478, P = 0.45), short satDNAs (< 100 bp) showedsignificantly higher divergence (median = 11.03) than longones (median = 6.89) (U = 320, P = 0.008, Bonferronicorrected Pb = 0.024), in coincidence with previous findingsin L. migratoria (Ruiz-Ruano et al. 2016).

Physical mapping of the 38 most abundant satDNAfamilies

PCR amplification, probe generation and FISH experimentswere performed for the 38 most abundant satDNA families,which cover half of the full P. conica satellitome catalog,representing 9.2% of the genome and 97.6% of all satDNAcontent. We found the three chromosome distribution patternsdescribed by Ruiz-Ruano et al. (2016) in L. migratoria, and anadditional pattern characterized by the presence of small dotsspread across many chromosome regions, which will benamed dotted (D) pattern (see below). For a better distinctionof the four patterns shown here, we suggest new names for

Chromosoma

Page 6: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Table 1 Main features of the 76satDNA families found in thegrasshopper Pyrgomorpha conica

SF SatDNA family RUL A + T (%) Abun (%) Div (%) Var

1 PcoSat01A-176 176 63.07 5.669 3.45 2

1 PcoSat02A-156 156 60.26 1.954 2.51 1

3 PcoSat03A-199 199 52.76 0.164 6.47 1

PcoSat04A-209 209 56.94 0.132 7.98 1

4 PcoSat05A-77 77 63.64 0.131 12.85 1

PcoSat06A-168 168 59.52 0.114 6.99 1

2 PcoSat07A-194 194 62.89 0.111 20.79 1

PcoSat08A-5-tel 5 60.00 0.099 2.7 1

2 PcoSat09A-138 138 58.70 0.089 16.06 1

PcoSat10A-109 109 63.30 0.078 19.27 1

2 PcoSat11A-196 196 61.73 0.058 13.68 1

PcoSat12A-106 106 60.38 0.038 7.13 1

PcoSat13A-62 62 54.84 0.034 9.19 1

4 PcoSat14A-75 75 65.33 0.033 7.64 1

PcoSat15A-251 251 57.77 0.033 6.59 1

6 PcoSat16A-149 149 59.73 0.033 21.34 1

7 PcoSat17A-146 146 62.33 0.031 4.68 1

8 PcoSat18A-139 139 45.32 0.028 12.9 2

7 PcoSat19A-151 151 62.91 0.027 5.49 1

6 PcoSat20A-150 150 58.67 0.026 17.01 1

9 PcoSat21A-113 113 63.72 0.025 3.67 1

PcoSat22A-267 267 67.04 0.023 7.21 1

PcoSat23A-286 286 61.89 0.023 8.83 1

PcoSat24A-161 161 64.60 0.021 5.34 1

PcoSat25A-320 320 62.19 0.020 20.47 1

5 PcoSat26A-236 236 61.44 0.020 6.89 1

8 PcoSat27A-143 143 50.35 0.019 17.96 1

3 PcoSat28A-197 197 49.24 0.017 7.25 1

PcoSat29A-154 154 68.83 0.016 8.03 1

11 PcoSat30A-92 92 51.09 0.015 18 1

5 PcoSat31A-198 198 64.14 0.015 7.26 1

PcoSat32A-84 84 60.71 0.014 15.73 1

PcoSat33A-93 93 60.22 0.013 12.04 1

PcoSat34A-92 92 55.43 0.013 10.71 1

10 PcoSat35A-270 270 62.22 0.012 6.67 3

PcoSat36A-159 159 62.89 0.011 9.32 1

12 PcoSat37A-218 218 48.17 0.010 11.78 2

14 PcoSat38A-35 35 40.00 0.010 17.6 2

5 PcoSat39A-165 165 57.58 0.010 2.61 1

9 PcoSat40A-152 152 59.87 0.010 6.46 1

PcoSat41A-145 145 71.72 0.009 8.73 1

PcoSat42A-115 115 72.17 0.009 4.9 1

10 PcoSat43A-275 275 61.09 0.009 5.07 3

5 PcoSat44A-73 73 67.12 0.009 7.95 1

PcoSat45A-188 188 61.70 0.009 17.36 2

PcoSat46A-92 92 40.22 0.008 17.88 1

PcoSat47A-149 149 61.74 0.008 11.06 1

PcoSat48A-70 70 72.86 0.008 7.07 1

PcoSat49A-169 169 45.56 0.008 5.04 1

Chromosoma

Page 7: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

them based on the FISH pattern: NS (no signal) pattern withabsence of FISH signal (Fig. 2a–c), equivalent to the previousnon-clustered pattern, B (banded) pattern with satDNAsforming large blocks on one or more chromosome regions(Fig. 2d–f), equivalent to the previous clustered pattern, DB(dotted-banded) pattern with many small loci on some chro-mosomes (with location difficult to pinpoint) and a few largeblocks restricted to one or few chromosomes (Fig. 2g–i),equivalent to the previous mixed pattern, and D (dotted) pat-tern consisting in many small loci visibly scattered acrossmost chromosome length except pericentromeric heterochro-matic blocks (Fig. 2j–l).

Out of the 38 satDNAs analyzed by FISH, 6, 24, 4 and 4showed the NS, B, DB and D patterns, respectively. Threetypes of chromosome location were differentiated for B andDB satDNAs, namely pericentromeric (p), interstitial (i) anddistal (d) (see some examples in Figs. 2 and S2). More thanhalf of the 106 bands observed for the 37 satDNAs analyzedby FISH (excluding the telomeric repeat) on the haploid set ofchromosomes in P. conicawere pericentromeric (56), whereas

only 19 and 31 were interstitial or distal, respectively(Table 2). Eight satDNA families (PcoSat01-176, PcoSat02-156, PcoSat13-62, PcoSat27-143, PcoSat28-197, PcoSat31-198, PcoSat32-84 and PcoSat33-93) showed bands only onpericentromeric regions (e.g., Fig. 2d–f), but only PcoSat01-176 and PcoSat02-156 were located on all chromosomes(Fig. 2d–f) and they were, with very high difference, the twomost abundant satDNAs in P. conica. On the contrary, threesatDNAs (PcoSat20-150, PcoSat21-113 and PcoSat25-320)showed FISH bands restricted to interstitial locations, whereasfive of them showed exclusively distal bands (PcoSat22-267,PcoSat24-161, PcoSat34-92, PcoSat35-270 and PcoSat37-218) (Table 2, Figs. 3a and S2a,b). However, 8 satDNAsshowed bands on two different chromosome regions and 3other were found on all three locations.

In addition to the telomeric repeat (PcoSat08-5-tel), eachchromosome of the haploid set carried, on average, 10.6 bandsof 10 different satDNAs. L2, L3 andM4 chromosomes carriedthe highest number of different satDNAs (13 each), M5, S9and X carried 10, and the four remaining chromosomes

Table 1 (continued)SF SatDNA family RUL A + T (%) Abun (%) Div (%) Var

5 PcoSat50A-73 73 65.75 0.007 2.25 1

PcoSat51A-186 186 52.15 0.007 3.92 1

PcoSat52A-158 158 49.37 0.007 6.1 1

PcoSat53A-163 163 64.42 0.007 11.4 1

13 PcoSat54A-215 215 57.21 0.007 6.88 1

PcoSat55A-127 127 59.06 0.006 5.24 1

13 PcoSat56A-215 215 60.00 0.006 7.76 1

PcoSat57A-243 243 43.62 0.006 11.32 1

PcoSat58A-76 76 77.63 0.006 10.04 1

9 PcoSat59A-122 122 63.93 0.006 6.73 1

PcoSat60A-122 122 49.18 0.005 5.02 2

PcoSat61A-89 89 50.56 0.005 16.89 1

11 PcoSat62A-80 80 43.75 0.004 16.7 1

12 PcoSat63A-226 226 52.21 0.004 9.56 1

PcoSat64A-181 181 65.19 0.004 3.78 1

PcoSat65A-150 150 52.67 0.004 4.25 1

PcoSat66A-109 109 59.63 0.004 9.15 1

15 PcoSat67A-134 134 55.22 0.004 3.94 2

PcoSat68A-134 134 51.49 0.004 5.3 1

PcoSat69A-105 105 50.48 0.003 4.51 1

PcoSat70A-149 149 60.40 0.003 3.18 1

5 PcoSat71A-150 150 60.00 0.003 3.57 1

PcoSat72A-38 38 60.53 0.003 11.03 1

PcoSat73A-80 80 65.00 0.003 10.63 1

14 PcoSat74A-35 35 48.57 0.003 16.04 1

15 PcoSat75A-126 126 58.73 0.002 8.94 1

10 PcoSat76A-253 253 63.24 0.001 4.3 1

SF superfamily, RUL repeat unit length, Abun genomic abundance, Div divergence, Var number of variants

Chromosoma

Page 8: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

carried 6–9 satDNAs each (Fig. 3a). Contrarily toL. migratoria (Ruiz-Ruano et al. 2016) and Eumigusmonticola (Ruiz-Ruano et al. 2017), the megameric chromo-some (S9) in P. conica does not carry the highest number ofdifferent satDNAs, but it is still the one carrying the highestnumber of interstitial satDNAs (6), in consistency with thetwo other species, the remaining chromosomes carrying onlythree or less. This remarks the possible involvement ofsatDNAs located across whole chromosome length in the mei-otic heterochromatinization of the megameric bivalent (seeRuiz-Ruano et al. 2016, 2017).

P. conica showed only seven chromosome-specificsatDNAs among the 37 physically mapped ones (Table 2and Fig. 3a; see some examples in Fig. S2b,c). ThesesatDNAs allow identifying five chromosome pairs (L3, M4,M5, M6 and X) in P. conica, as the M6 chromosome carriesthree specific satDNAs at interstitial (PcoSat21-113),pericentromeric (PcoSat28-197) and distal (PcoSat37-218) lo-cations. In addition, with the aid of other satDNAs showinglocations restricted to 2 or 3 chromosome pairs, the five re-maining chromosome pairs can be identified. Two of them (L2and M8) can be identified by the presence of PcoSat33-93, asthese chromosomes are the two only pairs carrying thissatDNA. Likewise, L1 is the only L chromosome carryingPcoSat20-150, M7 is the only M chromosome carrying apericentromeric band for PcoSat26-236, and S9 is the onlychromosome carrying interstitial bands for PcoSat12-106and PcoSat23-286.

The number of chromosomes showing FISH bands for the27 satDNAs showing the B and DB patterns (excluding the

telomeric repeat) was significantly negatively correlated withsatDNA divergence (rS = − 0.45, t = 2.52, P = 0.018).Likewise, a comparison of satDNA divergence between thefour FISH pattern groups indicated significant differences(Kruskal-Wallis test: H = 9.23, df = 3, P = 0.026), with B-pattern satDNAs showing lower divergence values and Dones being the most divergent. These results are consistentwith previous findings in Drosophila where Kuhn et al.(2012) showed that satDNA is distributed into large arraysin the heterochromatin of chromosomes 2, 3, and X, and shortarrays in the euchromatin of the same chromosomes. Theseauthors also concluded that homogenization reaches higherdegree in long arrays placed into heterochromatin than in theshort ones interspersed into euchromatin. It is clear that anassociation exists between satDNA amplification andhomogenization.

In satDNAs showing bands on specific locations of two ormore chromosome pairs (B and DB patterns), we calculatedthe equilocality index by pairwise comparison between allbands in the genome. Considering three chromosome loca-tions, i.e., pericentromeric (p), interstitial (i) and distal (d),we scored the equilocal (E) and non-equilocal (NE) pairwisecomparisons, and calculated the equilocality index (EI) asEI = E/(E + NE). On average, EI was 0.64 for the 38satDNAs physically mapped in P. conica (Table 2).

We found variation in band size for some satDNA families.It could be manifested between individuals (e.g. PcoSat04-209, Fig. S2d,e), by presence/absence differences betweenthe two members of a same chromosome pair (e.g.,PconSat04-209, Fig. S2e), or between non-homologous

Fig. 1 Repeat landscapes showing the abundance and divergence profiles for all satDNAs identified in P. conica (left), and after excluding the two moreabundant satDNAs, belonging to the PcoSF1 (right). Note that PcoSF1 represents 81% of all satDNA content in this species

Chromosoma

Page 9: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

chromosomes within a same genome, with large or small locion different chromosome pairs (e.g., PcoSat3-199, Fig. S2f).

Homology search reveals 15 SatDNA superfamilies

Homology search between the 76 satDNA families indicatedthe existence of 15 superfamilies (SFs), i.e., sets of familiesshowing homology, 11 of them including only 2 families, 3including 3 families and one (SF05) comprising 6 families(Table 1). Among the former type, we found two SFs wherethe two families showed the same length, so that they differedonly for nucleotide substitutions. Seven other SFs includedtwo families differing in less than 10 bp, so that they includedsubstitutions and short indels, and two SFs included two fam-ilies differing in more than 10 bp including longer indels in

addition to nucleotide substitutions (see alignments in Figs.S3, S4a and S5a). Out of the three SFs composed of threesatDNA families, SF10 only differed in 22 bp between thetwo most different families, with short indels between them(Fig. S3i). Likewise, SF09 members differed by up to 39 bp,in this case being due to differences in the number of repeatsfor TG and GGGGA microsatellites (Fig. S3 h). In addition,SF02 includes three satDNA families differing in up to 58 bpdue to a long indel (Fig. S3b). Finally, SF5 included six dif-ferent families ranging from 73 to 236 bp in length (Fig. S4a).The maximum divergence found between two satDNA fami-lies belonging to a same superfamily was 39% (betweenPcoSat07-194 and PcoSat09-138).

One-way ANOVAwith SF as grouping factor, and mono-mer length, A + T content, abundance and divergence as

Fig. 2 Patterns of chromosome distribution found in the 38 most abundant satDNA families in the grasshopper Pyrgomorpha conica: No signal (NSpattern) (a–c), banded (B pattern) (d–f), dotted-banded (DB pattern) (g–i) and dotted (D pattern) (j–l). Bar = 5 μm

Chromosoma

Page 10: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Table 2 Chromosome location of the 38 most abundant satDNA families in Pyrgomorpha conica

FISH Chromosome pair and location

SF Sat DNA family pattern L1 L2 L3 M4 M5 M6 M7 M8 S9 X p i d bands chrom E NE EI

SF1 PcoSat01-176 B p p p p p p p p p p 10 0 0 10 10 45 0 1

SF1 PcoSat02-156 B p p p p p p p p p p 10 0 0 10 10 45 0 1

SF3 PcoSat03-199 B pd p p p p 5 0 1 6 5 10 4 0.71

PcoSat04-209 B p pid p pd p i 5 2 2 9 6 11 17 0.39

SF4 PcoSat05-77 NS 0

PcoSat06-168 B p i 1 1 0 2 2 0 1 0

SF2 PcoSat07-194 D

PcoSat08-5-tel B t t t t t t t t t t

SF2 PcoSat09-138 B pi p 2 1 0 3 2 1 1 0.5

PcoSat10-109 D

SF2 PcoSat11-196 D

PcoSat12-106 B d d d i i d 0 2 4 6 6 8 8 0.5

PcoSat13-62 B p p p p 4 0 0 4 4 6 0 1

SF4 PcoSat14-75 NS

PcoSat15-251 NS

SF6 PcoSat16-149 B i p 1 1 0 2 2 0 1 0

SF7 PcoSat17-146 B p p i 2 1 0 3 3 1 2 0.33

SF8 PcoSat18-139 DB p p id 2 1 1 4 3 1 4 0.2

SF7 PcoSat19-151 B p p p p i 4 1 0 5 5 6 4 0.6

SF6 PcoSat20-150 DB i i i i 0 4 0 4 4 6 0 1

SF9 PcoSat21-113 B i 0 1 0 1 1

PcoSat22-267 B d 0 0 1 1 1

PcoSat23-286 DB p pd i 2 1 1 4 3 1 4 0.2

PcoSat24-161 B d d d d d d d 0 0 7 7 7 21 0 1

PcoSat25-320 B i 0 1 0 1 1

SF5 PcoSat26-236 B i p i 1 2 0 3 3 1 2 0.33

SF8 PcoSat27-143 DB p 1 0 0 1 1

SF3 PcoSat28-197 B p 1 0 0 1 1

PcoSat29-154 NS

SF11 PcoSat30-92 NS

SF5 PcoSat31-198 B p p 2 0 0 2 2 1 0 1

PcoSat32-84 B p 1 0 0 1 1

PcoSat33-93 B p p 2 0 0 2 2 1 0 1

PcoSat34-92 B d d d d d d d d d d 0 0 10 10 10 45 0 1

SF10 PcoSat35-270 B d d d 0 0 3 3 3 3 0 1

PcoSat36-159 DB

SF12 PcoSat37-218 B d 1 1 1

SF14 PcoSat38-35 NS

Total p 4 8 10 9 6 3 7 5 2 2 56 64%

Total i 2 3 0 2 2 1 0 0 6 3 19

Total d 3 5 4 3 2 2 2 2 3 5 31

Total bands 9 16 14 14 10 6 9 7 11 10 106

Total satDNAs 9 13 13 13 10 6 9 7 10 10

SF = superfamily, B = banded, NS = no signal, D = dotted, DB = dotted-banded, L1-L3 = long autosomes, M4-M8 =medium-sized autosomes. S9 =short autosome, X = X chromosome, p = pericentromeric, i = interstitial, d = distal, t = telomeric, chrom = chromosome, E = equilocal, NE = non-equilocal, EI = equilocality index

Chromosoma

Page 11: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

dependent variables, showed highly significant differencesbetween the 15 SFs (P < 0.0001 in all four cases). From theseANOVAs, we calculated intraclass correlation as a measure ofthe homogeneity within SFs relative to the variation observedbetween them. This analysis revealed very high homogeneityamong the families belonging to a same SF, as 71% of varia-tion in monomer length, 82% of A + T content, 68% of abun-dance and 84% of divergence were found between SFs. Thisshows that homologous satDNA families belonging to a sameSF tend to conserve the main features.

SF11 superfamily includes homologous HORs

One of the superfamilies found, i.e., SF11, showed internalstructure with homologous subrepeats. Dot-plot inspectionsuggested that the two members of SF11 (PcoSat30-92 andPcoSat62-80) are composed of two 46 bp homologoussubrepeats (onwards named A and B), the latter satDNA con-taining a deletion in the B subrepeats (Fig. S5a,b), thus sug-gesting that these two satDNA families are actually higher-order repeats (HORs). To ascertain whether they fit the con-ventional HOR definition, we tested whether identity washigher between alternate than contiguous subrepeats (seeWillard and Waye, 1987). For this purpose, we selectedIllumina reads showing homology with Sat30 and Sat62,using RepeatMasker, and then joined the read-pairs withfastq-join. This yielded 1772 read-pairs of about 190 bp.Using reference sequences composed of 1.5 RUs (i.e., three

subrepeats in the A1-B1-A2 and B1-A2-B2 sequences), wesearched for read-pairs showing homology with the full lengthof any of these references. This yielded 41 read-pairs forSat30-A1-B1-A2, 43 for Sat30-B1-A2-B2, 50 for Sat62-A1-B1-A2 and 42 for Sat62-B1-A2-B2. We then calculated p-distances between alternate (A1 vs A2 or B1 vs B2) and con-tiguous (A1 vs B1, B1 vs A2 and A2 vs B2) subrepeats, usingMEGA software, and compared them by Student t tests. AsFig. S5c shows, in all cases the p-distances were higher be-tween contiguous subrepeats, as expected for HORs, althoughthe differences were only marginally significant for Sat30 buthighly significant for Sat62, perhaps because the former isolder and has accumulated higher divergence, given that bothshow poor homogenization (18 and 16.7% divergence,respectively).

Bearing in mind that the dot-plots of SF11 (Fig. S5b) sug-gested the possibility of substructuring below the 46 bpsubrepeats mentioned above, we tried to get additional in-sights on the origin of these HORs by trying to determinethe basic subrepeat unit as possible. For this purpose, we usedthe 1772 read-pairs mentioned above to search for tandemrepeat structure into them, using Tandem Repeats Finder(TRF) (Benson 1999), with restrictive conditions marked bythe B2 7 7 80 10 200 200^ parameters. We then selected thoseread-pairs showing adjacent tandem repeats with at least 95%identity, and then obtained their consensus sequence, i.e., oneper read-pair meeting the former criterion. This yielded 210read-pairs showing a 23, 34, 46, 56, 68, 80, 92 and 102 nt

Fig. 3 a Ideogram representingthe FISH location for the bandedsatDNAs. Numbers indicatesatDNA family number in orderof decreasing abundance.Chromosome-specific satDNAsare underlined. X =Xchromosome, L1-L3 = Longautosomes, M4-M8 =Medium-sized autosomes, S9 = Shortautosome. Note the highernumber of interstitial satDNAs onthe S9 autosome, which isfacultatively heterochromatinizedduring meiosis. b Double FISHshowing the presence of largebands of telomeric (Tel) DNAoverlapped with the largepericentromeric bands formed byPcoSat01-176 (Sat01). Bar =5 μm

Chromosoma

Page 12: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

arithmetic progression with period between 10 and 12 nt (Fig.S6a). To investigate the existence of even shorter tandem re-peat structure, we reanalyzed the 1772 read-pairs with TRFand the B2 7 7 80 10 50 20^ parameters, with maximum scoreequal to 50 and maximum repeat unit length equal to 20. Wethen selected read-pairs showing 80% average length betweenadjacent repeat units and, remarkably, 96% of the 236 read-pairs found included tandem repeats of 11 nt (Fig. S6b). Thissuggests that the ancestral repeat unit which yielded the twosatDNA families constituting SF11 had 11 nt. We then obtain-ed a consensus sequence for each tandem repeat length (i.e.,11, 22, 33, etc.) and built an alignment of up to 102 bp (e.g.,nine units of 11 bp) which clearly showed the internal sub-structure of these HORs (Fig. S6c).

A new type of heterologous HORs in SF05 superfamily

SF05 superfamily included six satDNA families showing par-tial or full homology, with repeat unit lengths ranging from73 bp (PcoSat44-73 and PcoSat50-73) to 236 bp (PcoSat26-236), with three intermediate lengths: 150 bp (PcoSat71-150),165 bp (PcoSat39-165) and 198 bp (PcoSat31-198). Sequencealignment of all six satDNAs showed complex relationshipsbetween them (Fig. S4a). The most apparent feature is that allshare a 73 bp region equivalent to the two short satDNAs(Sat44 and Sat50) whereas the four longer satDNAs showadditional sequences showing a certain degree of homologybetween them. Since the latter region showed much loweridentities between families than the 73 bp region, we mayrefer to them as Bvariable^ and Bconserved^ regions,respectively.

Calculation of p-distances between the six satDNAs for theconserved region revealed much higher identity between twopairs of satDNAs (Sat26-Sat44 and Sat39-Sat50), whichshowed p-distances about one order of magnitude lower thanthose found involving the two remaining satDNAs (i.e., Sat31and Sat71) (Fig. S4b). In fact, a MST made for the conservedregion (Fig. S4c) showed that PcoSat26-236 and PcoSat44-73showed the highest identity (p-distance = 0.03), followed bythat between PcoSat39-165 and PcoSat50-73 (p-distance =0.04), whereas the two remaining satDNAs (PcoSat31-198and PcoSat71-150) showed many nucleotidic differences withthe former four (p-distances from 0.21 to 0.30, see Fig. S4b).

Calculation of p-distances for the variable region betweenthe four satDNAs carrying it revealedmuch higher values thanin the case of the conserved region, the lowest distance (0.20)being found between PconSat26-236 and PconSat71-150,whereas all remaining distances ranged from 0.45 to 0.61(Fig. S4b). Remarkably, in these four satDNAs, the variancein p-distances was 27 times higher for the variable region thanfor the conserved one.

The existence of the conserved region of SF05 satDNAs asindependent satDNAs (i.e., Sat44 and Sat50) indicates that the

four remaining satDNAs are actually heterologous HORs, asthey resulted from a combination of a 73 bp pre-existingsatDNA and another genomic sequence showing no homolo-gy with the satDNA.

The tandem repeat structure of these HORs was analyzedfor the PcoSat26-236 and PcoSat39-165 satDNA families. Forthis purpose, we selected Illumina read-pairs showing homol-ogy with 50 nt sequence junctions at both ends of eachsatDNA repeat unit, and mapped them against a satDNA di-mer. Previous analysis using the P. conica mitogenome indi-cated that the immense majority of insert sizes in our Illuminalibrary were lower than 500 bp (Fig. S1b), which indicates thatmost read-pairs covered lengths lower than a satDNA dimer.Subsequent mapping of read-pairs, against the dimer, showedthe physical contiguity of these units and thus the tandemrepeat structure of these two complex satDNAs (Fig. S4d,e).To test the degree of homogenization of these two heterolo-gous HORs, we compared divergence in their componentssubrepeats with those found when these subrepeats are inde-pendent outside the HOR. For this purpose, we selected allIllumina reads showing homology with SF05 usingRepeatMasker. Then we paired these reads using fastq-jointo obtain longer sequences. Then we classified the obtainedread-pairs into two types: (i) those showing homology withboth conserved and variable subrepeats (mixed reads, beingrepresentative of the HOR), and (ii) those showing homologywith only one subrepeat (pure reads, being representative ofthe independent subrepeats). No pure reads were found for thevariable region, indicating that this region is not found outsidethe HOR. Therefore, only the conserved region could be com-pared inside (mixed read-pairs) and outside (pure reads) theHOR. Additional RepeatMasker analyses, for each type ofread-pairs, showed that the 2917 mixed and 638 pure read-pairs showing identity higher than 80% with PcoSat44A-73showed 3.97 and 2.27% divergence in respect to the consen-sus sequence, respectively, whereas the 2931 mixed and 523pure read-pairs, displaying identity higher than 80% withPcoSat50A-73, showed 2.11 and 1.39% divergence, respec-tively. This indicated that the two short satDNAs show higherabundance and divergence when they are part of the heterol-ogous HOR than as independent satDNAs. Taken together,the former results indicate that heterologous HORs (e.g.,PcoSat26-236 and PcoSat39-165) show poorer homogeniza-tion than their 73 bp subrepeat when independent.

Telomeric DNA forms a heterologous HORwith PcoSat01A-176

A third example of heterologous HORs composed of heterol-ogous subrepeats involved the telomeric repeat. TelomericDNA (PcoSat08-5-tel) in P. conica is not restricted to thedistal ends of chromosomes, as it was also found at internallocations, interspersed between the pericentromeric blocks of

Chromosoma

Page 13: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

PcoSat01-176 and PcoSat02-156 on one or both members ofsix autosome pairs and the X chromosome (Figs. 3b and S7).To analyze possible structural association between thetelomeric repeat and these two pericentromeric satDNAs, wesearched for read-pairs where at least one of them showedhomology with the telomeric repeat on the randomly selected5 million Illumina read-pairs used for abundance analysis,using RepeatMasker. In total, 7826 read-pairs met this condi-tion, representing 0.1565% of all reads. It is logical that thisfigure is higher than the abundance of PcoSat08-5-tel(0,0989%) because the latter was expressed per nucleotideand those reads containing the telomeric repeat only in partof their length counted only partially in abundance estimation.

With these 7826 read-pairs carrying telomeric repeats, weperformed a RepeatExplorer clustering using the 87 satDNAvariants previously found with satMiner (Table S2) as customdatabase, in order to identify possible satDNAs in the resultingclusters. To detect the possible presence of other repeatedelements, such as TEs or rDNA, we also annotated the clusterswith RepBase. After a first clustering, we performed a clustermerging which yielded five clusters, with a major cluster in-cluding 53.40% of the reads employed. Annotation of thiscluster with RepBase gave no results, but that with the customdatabase revealed that 37.2% of annotations correspondedwith PcoSat01A-176, 35.6% with PcoSat08-5-tel and 24.2%with PcoSat02-156. In total, 97% of the reads included in thiscluster were annotated as these three satDNA families. Thissuggests structural association of the telomeric repeat with thesatDNAs belonging to SF1, which are the most abundantpericentromeric satDNAs. The four remaining clusters repre-sented 1.1% of the reads used and they failed to be annotatedfor any element different from PcoSat08-5-tel.

The most abundant Repeat Explorer cluster contained asequence composed of 404 bp (Fig. 4a, b). Against this se-quence, we aligned those of PcoSat01A-176, PcoSat01B-168,PcoSat02-156 and PcoSat08-5-tel, and this revealed the pres-ence of 184 nt in the middle showing homology with SF1,flanked by telomeric repeats summing up 404 nt. We themaligned this contig with PcoSat01A-176, PcoSat01B-168,PcoSat02-156 sequences, using Geneious, and estimated95.1%, 87.5%, and 85.6% identities, respectively, with theSF1 part of the contig. This indicates that PcoSat01A-176 isthe satDNA variant most likely being structurally associatedwith telomeric repeats (Fig. 4b).

We found three kinds of evidence indicating that this404 bp unit is tandemly repeated. First, its Repeat Explorergraph was ring shaped (Fig. 4a), as usual for tandem repeatswith unit length surpassing read length (Ruiz-Ruano et al.2016). Second, PCR experiments with the two possible primercombinations for both satDNAs (i.e., PcoSat01-176_F/Tel_Rand PcoSat01-176_R/Tel_F) (see Fig. 4c and Table S1), usinga PCR program for short satDNAs, yielded amplification withladder pattern in both reactions (not shown). The first primer

combination worked better yielding the two expected bands ofabout 200 bp and 600 bp. Third, mapping of read-pairs againstthe dimer, to analyze 5′ and 3′ HOR junctions, by the samemethod described in the former section, showed the physicalcontiguity of these units and thus the tandem repeat structureof the complex satDNA composed of PcoSat01A-176 andtelomeric DNA (Fig. 4c).

It was remarkable to find that telomeric repeats in P. conicashow, as a whole, higher divergence (2.7%) than those previ-ously reported in L. migratoria (1.75%) and Eumigusmonticola (1.44%) (Ruiz-Ruano et al. 2016, 2017). For thisreason, we analyzed divergence of PcoSat08-5-tel andPcoSat01A-176 in the 7826 read-pairs showing homologywith the telomeric motive and compared it with that observedin read-pairs bearing only PcoSat08-5-tel or PcoSat01A-176,using RepeatMasker. Divergence of PcoSat01A-176 was3.34% when associated with telomeric DNA but 2.66% whenit was alone. Likewise, the telomeric repeats included in the4792 read-pairs showing partial homology with PcoSat01A-176, showed 4.05% divergence, whereas the remaining 3034read-pairs (containing only telomeric repeats) showed 1.38%divergence, a figure being very similar to those previouslyfound in L. migratoria and E. monticola (see above) and log-ically expected for telomerase-dependent repeats. This sug-gests that P. conica carries two types of telomeric repeatsclearly differing in divergence, the low-divergence repeatspresumably being the canonical telomere repeats, and thehigh-divergence ones being those taking part in the heterolo-gous HOR with PcoSat01A-176. Remarkably, the HOR-associated telomeric repeats show divergences similar to thoseof their partner satDNA, thus behaving more like a satDNAthan as true telomeric DNA. The read numbers found for thetwo former types seem to suggest that the P. conica genomecontains about 60% of telomere repeats taking part in theheterologous HOR, and 40% of canonical telomere repeats.

Discussion

We characterize here, for the first time, the satellitome of thegrasshopper species Pyrgomorpha conica and provide the firstphysical mapping of the 38most abundant satDNA families inthe genome of this species. This has revealed new levels ofcomplexity in satDNA formation. Although most satDNAfamilies found showed simple units, lacking internalsubrepeats, the P. conica satellitome also contains two typesof complex units showing internal substructuring. First, theDNA sequence of PcoSat30-92 and PcoSat62-80 showed in-ternal substructuring by tandem repetition of homologoussubrepeats, which can be detected at several lengths, as mul-tiples of 11 bp. These satDNAs are thus equivalent to conven-tional HORs where alternate subrepeats show higher identitythan contiguous ones thus suggesting that these HORs are

Chromosoma

Page 14: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

actively submitted to concerted evolution (Willard and Waye1987; Warburton and Willard 1990; Plohl et al. 2012). Thesecond type of HOR, however, included complex satDNAunits composed of a satDNA family combined with othersubrepeats showing no homology with it. This was the casefor four satDNA families belonging to SF05 (PcoSat26-236,PcoSat31-198, PcoSat39-165 and PcoSat71-150) which con-tain a conserved subrepeat showing homology with two inde-pendent satDNAs (PcoSat44-73 and PcoSat50-73) combinedwith an heterologous variable region being present only in thefour satDNAs with longer RUs. On this basis, the four longersatDNAs can be considered heterologous HORs. Another

example of heterologous HOR resulted from a combinationof telomeric repeats (PcoSat08-5-tel) with the most abundantsatDNA (PcoSat01A-176).

The finding of heterologous HORs brings about new andinteresting insights onto the evolution of satDNA. In case ofthe SF05 heterologous HORs, it was interesting to note thatthe two short satDNAs which gave rise to the HORs(PcoSat44A-73 and PcoSat50A-73) showed higher diver-gence when they are combined in a heterologous HOR(3.97% and 2.11%, respectively) than when they are indepen-dent satDNAs (2.27% and 1.39%, respectively). Likewise,PcoSat08-5-tel and PcoSat01A-176 sequences showed higher

Fig. 4 a RepeatExplorer graph resulting from a selection of Illuminareads showing homology with the telomeric sequence. A contigassembled with these reads contained the Telomeric-PcoSat01A-176 het-erologous HOR. b Alignment showing the resulting heterologous HORfrom the RepeatExplorer assembly and the three PcoSF01 components.The PcoSat01A-176 variant showed the highest identity with the non-telomeric region. c) Mapping of read-pairs showing homology with theheterologous HOR (see Fig. S1 for methods). Note that the read-pairs

mapped on junctions between the two different subrepeats of theTelomeric-PcoSat01A-176 HOR. The reference sequence, includingabout two HOR dimers, is shown at the bottom. Anchoring sites fortwo primer pairs are indicated on this sequence. For both primer pairs,we got a ~ 200 bp amplicon when amplifications occurred on the sameHOR monomer (dark red lines) and a ~ 500 bp amplicon when amplifi-cations occurred in contiguous HOR monomers (red lines)

Chromosoma

Page 15: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

divergence when they belonged to a heterologous HOR(4.05% and 3.34%, respectively) than when they were inde-pendent (1.38% and 2.66%, respectively). Telomeric repeatsare actively homogenized by telomerase activity implying theaddition of the canonical telomeric oligomer (Greider andBlackburn 1985). The about similar divergence observed inthe two heterologous components of the HOR when theyshare the complex units (3.34–4.05%) suggests that they be-have like a true satDNA, for which reason its telomeric repeatsshow higher divergence than the canonical ones, presumablybecause they are not affected by telomerase action. In addi-tion, the lower divergence observed for PcoSat01A-176 unitsbeing free of telomeric repeats suggests that homogenizationof this satDNA is better when it is not included in the heter-ologous HOR. This suggests that homogenization workspoorer for satDNA sequences involved in a heterologousHOR.

Homologous HORs for human alpha-satellite constituteunits for concerted evolution (Willard and Waye 1987) andappear to play a role in centromere function (Schueler et al.2001; Fukagawa and Earnshaw 2014; Sujiwattanarat et al.2015; Giunta and Funabiki 2017). In P. conica, although thehomologous HORs found for SF11 fit the classical HOR def-inition, it is highly unlikely that they may be involved intocentromeric function because of the high values of divergenceobserved for PcoSat30-92 (18%) and PcoSat62-80 (16.7%)and, most importantly, because at least PcoSat30-92 was ana-lyzed by FISH and it failed to show any signal (see Table 2).

The heterologous HORs found in P. conica belonging toSF05 showed divergences ranging between 2.25% inPcoSat50-73 to 7.95% in PcoSat44-73, thus being lower thanthe former homologous HORs even though the formershowed a variable region. The SF05 heterologous HORs areunlikely involved in centromere function as the two satDNAsanalyzed by FISH (PcoSat26-236 and PcoSat31-198) showedbands on pericentromeric regions of only one or two chromo-somes, respectively (see Table 2). However, the HOR formedby telomeric repeats and the most abundant satDNA(PcoSat01-176) might have the chance to play a role in cen-tromeric function, as this satDNA, like alpha-satellite inhumans, is located on pericentromeric regions of all chromo-somes. However, the fact that this satDNA shows lower di-vergence when it is not associated with telomeric repeats thanwhen it takes part in the heterologous HOR suggests that thelatter does not constitute a true unit of concerted evolution. Onthis basis, although additional research is necessary to ascer-tain whether heterologous HORs have a functional role, theavailable information points to simply being a different man-ifestation of satDNA as genomic junk.

The existence of conserved and variable subrepeats withinSF05 HORs suggests the possibility that complex satDNArepeats might be homogenized only in part of their sequencemotive. Sequence comparison between the two short

satDNAs belonging to SF05 suggests that their divergence(p-distance = 0.23) has recently surpassed the threshold to beconsidered different families (80%). This p-distance for theconserved 73 bp region between each of the short satDNAsand the four long ones ranged from 0.23 to 0.27, with a singleexception for each short satDNA, as it was one order of mag-nitude lower between Sat44 and Sat26 (0.03), and also be-tween Sat50 and Sat39 (0.04) (see Fig. S4b). Given that theassociation between two non-homologous sequences into aheterologous HOR is a very unlikely event, it is reasonableto postulate that it occurred only once, presumably when thetwo short satDNAs still belong to a same family. In this case,the extremely low differentiation found between the con-served subrepeat sequences in the Sat44-Sat26 and Sat50-Sat39 cases demands an additional explanation. We suggeststhe possibility that this extreme identity between the indepen-dent satDNA and the corresponding HOR subrepeat might bea side effect of PcoSat44A-73 and PcoSat50A-73 homogeni-zation, if the latter would take place through rolling-circleamplification of the 73 bp long oligonucleotides copied fromthe former satDNAs, and their reinsertion on the homologoussequences of PcoSat26-236 and PcoSat39-165. SatDNA am-plification by rolling-circle replication and reinsertion at ho-mologous sites was first suggested by Walsh (1987), and laterdemonstrated by Rossi et al. (1990), and even visualized byCohen et al. (2005, 2008, 2010) and Navrátilová et al. (2008).It has even been suggested that rolling-circle amplificationmay constitute a reparation mechanism specific to satDNA(Feliciello et al. 2006), and the involvement of this mechanismin satDNA evolution is more and more beyond doubt.

Our results have shown that PcoSat01A-176 andPcoSat08-5-tel are not simply interspersed as they haveformed a new higher-order structure actually constituting aheterologous HOR. The presence of interstitial telomeric re-peats (ITRs) in P. conicawas first shown by López-Fernándezet al. (2004), in a survey of 11 grasshopper species. Thesesame authors later obtained a centromeric DNA probe bydigesting whole genomic DNA with the Alu I restriction en-zyme and cloned a 500 bp fragment to generate a FISH probe.Conventional FISH showed the presence of large arrays oftelomeric repeats on the pericentromeric heterochromatin ofmost chromosomes, and fiber-FISH revealed a distribution ofITRs interspersedly alternating with other possible highly re-petitive DNA sequences (López-Fernández et al. 2006). Toour knowledge, these authors did not give information onthe DNA sequence of this satDNA. However, our presentresults allow identifying it as one of the three satDNAs includ-ed in the SF1 superfamily (PcoSat01A-176, PcoSat01B-168and PcoSat02-156), whichmake up the whole pericentromericheterochromatic blocks in this species. All three satDNAscarry two Alu I targets in their sequence separated by 121,121 and 101 bp, respectively. The 500 bp of the fragmentcloned by López-Fernández et al. (2006) represents about

Chromosoma

Page 16: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

three units of one of these satDNAs. In silico restriction map-ping (not shown) predicted that any of the three satDNAscould have yielded fragments close to 500 bp. However, dueto its much higher abundance, PcoSat01A-176 was most like-ly the prevailing satDNA in the probe used by these authors.

It is interesting to note that, excepting the telomeric repeat,neither of the 76 satDNA families comprising the satellitomeof P. conica showed homology with any of those previouslyobserved in the satellitomes of Locusta migratoria (Ruiz-Ruano et al. 2016) and Eumigus monticola (Ruiz-Ruanoet al. 2017) which, as shown by these same authors, neithershare any satDNA family. Remarkably, these three speciesbelong to three orthopteran families, i.e., Pyrgomorphidae,Acrididae and Pamphagidae which shared their most recentcommon ancestor many years ago. The oldest family, thePyrgomorphidae, dates back about 140 mya, thePamphagidae about 111 mya, and the Acrididae about 73mya (Song et al. 2015). Therefore, the absence of homologoussatDNAs between these species indicates that the turnover of acomplete satDNA library in grasshoppers lasts less than 73my. This is expected due to the high rate of turnover suggestedby the library model (Ugarković and Plohl 2002).

The high differences in the age of these three genomesallow some comparisons between their satellitomes yieldinginteresting evolutionary conclusions. (1) Older species showsatellitomes enriched in long satDNAs, since 75% of satDNAfamilies were longer than 100 bp in P. conica, whereas thisfigure was only 48% inE. monticola and 56% in L. migratoria(RxC contingency analysis: P = 0.015 S.E. 0.002). (2) Olderspecies show a lowest proportion of chromosome-specificsatDNA families: 16.2% in P. conica, 27% in E. monticola(Ruiz-Ruano et al. 2017) and 56.9% in L. migratoria (Ruiz-Ruano et al. 2016), suggesting that intragenomicmovement ofsatDNAs, between non-homologous chromosomes, is propor-tional to genome age. Alternatively, since this conclusion isbased only in the analysis of three species, the possibilityremains that satDNA intragenomic movement is intrinsicallyhigher in P. conica. (3) The oldest species (P. conica) showshigher number of satDNA superfamilies (15) than the twoother species (only 5 in L. migratoria and 4 inE. monticola), and it showed a variety of complex satDNAsshowing internal structure with homologous or heterologoussubrepeats, which were not found in the two other species.These complex satDNAs included cases of microsatellites em-bedded within longer satDNAs, as well as homologous andheterologous HORs. These tendencies are consistent withPlohl et al. (2008) suggestion that Bthe increase in repeat unitlength and complexity by merging shorter repeat motifs into aHOR seems to be a common trend in at least some satDNAsand/or organisms^. An evolutionary trend towards increasingmonomer length and complexity was also predicted by theo-retical models (Stephan and Cho 1994). If such a tendency isreal, then we should expect higher satDNA complexity and

unit length in old genomes, and this could be tested by thehigh-throughput analysis of the satellitome in other species.

Acknowledgments We thank Teresa Palomeque, Pedro Lorite andManuel Garrido for their valuable comments on the manuscript. Thisstudy was funded by grants from Spanish Plan Andaluz deInvestigación (CVI-6649) and Secretaría de Estado de Investigación,Desarrollo e Innovación (CGL2015-70750-P) and was partially per-formed by FEDER funds. F.J. Ruiz-Ruano was supported by a Junta deAndalucía fellowship.

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict ofinterest.

Ethical approval All applicable international, national, and/or institu-tional guidelines for the care and use of animals were followed

References

Antonio C, González-García JM, Suja JA (1993) Pycnotic cycle of thesex chromosome in Pyrgomorpha conica (Orthoptera) and develop-ment of spermiogenesis. Genome 36:535–541. https://doi.org/10.1139/g93-073

Benson G (1999) Tandem repeats finder: a program to analyze DNAsequences. Nucleic Acids Res 27:573–580

Blanchetot A (1991) Genetic variability of a satellite sequences in thedipteran Musca domestica. EXS 58:106–112

Britten RJ, Kohne DE (1968) Repeated sequences in DNA. Science 161:529–540

Bonaccorsi S, Lohe A (1991) Fine mapping of satellite DNA se-quences along the Y chromosomes of Drosophila melanogaster:relationship between satellite sequences and fertility factors.Genetics 129:177–189

Camacho JPM, Cabrero J, López-León MD, Cabral-de Mello DC, Ruiz-Ruano FJ (2015b) Grasshoppers (Orthoptera). In Sharakhov IV (ed)Protocols for cytogenetic mapping of arthropod genomes. CRCPress, pp 381-438

Camacho JPM, Ruiz-Ruano FJ, Martín-Blázquez R, López-León MD,Cabrero J, Lorite P, Cabral-de-Mello DC, Bakkali M (2015a) A stepto the gigantic genome of the desert locust: chromosome sizes andrepeated DNAs. Chromosoma 124:263–275. https://doi.org/10.1007/s00412-014-0499-0

Charlesworth B, Sniegowsky P, Stephan W (1994) The evolutionarydynamics of repetitive DNA in eukaryotes. Nature 371:215–220.https://doi.org/10.1038/371215a0

Cohen S, Agmon N, Sobol O, Segal D (2010) Extrachromosomal circlesof satellite repeats and 5S ribosomal DNA in human cells. MobDNA 1:11. https://doi.org/10.1186/1759-8753-1-11

Cohen S, AgmonN, Yacobi K,Mislovati M, Segal D (2005) Evidence forrolling circle replication of tandem genes in Drosophila. NucleicAcids Res 33:4519–4526. https://doi.org/10.1093/nar/gki764

Cohen S, Houben A, Segal D (2008) Extrachromosomal circular DNAderived from tandemly repeated genomic sequences in plants. PlantJ 53:1027–1034. https://doi.org/10.1111/j.1365-313X.2007.03394.x

Drummond AJ, Ashton B, Cheung M, Heled J, Kearse M (2009)Geneious 4.8. Biomatters Ltd, Auckland, New Zealand

Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series ofprograms to perform population genetics analyses under Linux andWindows. Mol Ecol Resour 10:564–567. https://doi.org/10.1111/j.1755-0998.2010.02847.x

Chromosoma

Page 17: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Feliciello I, Akrap I, Ugarkovic (2015) Satellite DNA modulats geneexpression in beetle Tribolium castaneum after heat stress. PLoSGenet 11:e1005466. https://doi.org/10.1371/journal.pgen.1005466

Feliciello I, Picariello O, Chinali G (2006) Intra-specific variability andunusual organization of the repetitive units in a satellite DNA fromRana dalmatina: molecular evidence of a new mechanism of DNArepair acting on satellite DNA. Gene 383:81–92. https://doi.org/10.1016/j.gene.2006.07.016

Fukagawa T, Earnshaw WC (2014) The centromere: chromatin founda-tion for the kinetochore machinery. Dev Cell 30:496–508. https://doi.org/10.1016/j.devcel.2014.08.016

García G, Ríos N, Gutiérrez V (2015) Next generation sequencing detectsrepetitive elements expansion in giant genomes of annual killifishgenus Austrolebias (Cyprinodontiformes, Rivulidae). Genetica 143:353–360. https://doi.org/10.1007/s10709-015-9834-5

Garrido-Ramos MA (2017) Satellite DNA: an evolving topic. Genes 8:230. https://doi.org/10.3390/genes8090230

Giunta S, Funabiki H (2017) Integrity of the human centromere DNArepeats is protected by CENP-A, CENP-C, and CENP-T. Proc NatlAcad Sci U S A 114:1928–1933. https://doi.org/10.1073/pnas.1615133114

Greider CW, Blackburn EH (1985) Identification of a specific telomereterminal transferase activity in tetrahymena extracts. Cell 43(2 Pt 1):405–413. https://doi.org/10.1016/0092-8674(85)90170-9

Hahn C, Bachmann L, Chevreux B (2013) Reconstructing mitochondrialgenomes directly from genomic next-generation sequencingreads—a baiting and iterative mapping approach. Nucleic AcidsRes 41:e129. https://doi.org/10.1093/nar/gkt371

Henikoff S (1998) Conspiracy of silence among repeated transgenes.Bioessay 20:532–535. https://doi.org/10.1002/(SICI)1521-1878(199807)20:7<532::AID-BIES3>3.0.CO;2-M

Hemleben V, Torres-Ruiz R, Schmidt T, Zentgraf U (2000)Molecular cellbiology: role of repetitive DNA in nuclear architecture and chromo-some structure. In: Progress in botany vol 61. Springer, Germany, pp91–117

Hsieh J, Fire A (2000) Recognition and silencing of repeated DNA.Annu Rev Genet 34:187–204. https://doi.org/10.1146/annurev.genet.34.1.187

John B (1988) The biology of heterochromatin. In: RS Verman (ed)Heterochromatin, molecular and structural aspects. CambridgeUniversity Press, Cambridge, pp. 1–147

Kass DH, Batzer MA (2001) Genome organization:human. ELS:1–8.https://doi.org/10.1038/npg.els.0001889

Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res12:656–664. https://doi.org/10.1101/gr.229202

Kuhn GC, Küttler H, Moreira-Filho O, Heslop-Harrison JS (2012) The1.688 repetitive DNA of Drosophila: concerted evolution at differ-ent genomic scales and association with genes. Mol Biol Evol 29:7–11. https://doi.org/10.1093/molbev/msr173

Langmead B, Salzberg SL (2012) Fast gapped-read alignment withbowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth.1923

Lica LM, Narayanswami S, Hamkalo BA (1986) Mousse satellite DNA,centromere structure and sister chromatid pairing. J Cell Biol 103:1045–1151

Littlewood DTJ, Olson PD (2001) Small subunit rDNA and theplatyhelminthes: signal, noise, conflict and compromise. In D.T.J.Littlewood & R.A. Bray (eds) Interrelationships of thePlatyhelminthes .CRC Press, UK, pp, 1–33

López-Fernández C, Arroyo F, Fenández JL, Gosálvez J (2006)Interstitial telomeric sequence blocks in constitutive pericentromericheterochromatin from Pyrgomorpha conica (Orthoptera) areenriched in constitutive alkali-labile sites. Mut Res 599:36–44.https://doi.org/10.1016/j.mrfmmm.2006.01.004

López-Fernández C, Gosálvez J, Suja JA, Mezzanotte (1988) Restrictionendonuclease digestion of meiotic and mitotic chromosomes in

Pyrgomorpha conica (Orthoptera: Pyrgomorphidae). Genome 30:621–626. https://doi.org/10.1139/g88-105

López-Fernández C, Pradillo E, Zabal-Aguirre M, Fenández JL, de laVega G, C, Gosálvez J (2004) Telomeric and interstitial telomeric-like DNA sequences in Orthoptera genomes. Genome 47:757–763.https://doi.org/10.1139/g03-143

López-Flores I, Garrido-Ramos MA (2012) The repetitive DNA contentof eukaryotic genomes. In: Garrido-Ramos MA (ed) RepetitiveDNA. Genome Dyn, 7, Karger, Basel, pp 1-28. doi:https://doi.org/10.1159/000337118

Macas J, Kejnovský E, Neumann P, Novák P, Koblížková A, Vyskot B(2011) Next generation sequencing-based analysis of repetitiveDNA in the model dioceous plant Silene latifolia. PLoS One 6:e27335. https://doi.org/10.1371/hournal.pone.0027335

Macas J, Neumann P, Navrátilová A (2007) Repetitive DNA in the pea(Pisum sativum L.) genome: comprehensive characterization using454 sequencing and comparison to soybean and Medicagotruncatula. BMC Genomics 8:427. https://doi.org/10.1186/1471-2164-8-427

Meštrović N, Mravinac B, Pavlek M, Vojvoda-Zeljko T, Šatović E, PlohlM (2015) Structural and functional liaisons between transposableelements and satellite DNAs. Chromosom Res 23:583–596.https://doi.org/10.1007/s10577-015-9483-7

Meyne J, Hirai H, Imai HT (1995) FISH analysis of the telomere se-quences of bulldog ants (Myrmecia: Formidae). Chromosoma 104:14–18. https://doi.org/10.1007/BF00352221

Miga KH (2015) Completing the human genome: the progress and chal-lenge of satellite DNA assambly. Chromosom Res 23:421–426.https://doi.org/10.1007/s10577-015-9488-2

Mravinac B, Plohl M (2007) Satellite DNA junctions identify the poten-tial origin of new repetitive elements in the beetle Triboliummadens.Gene 394:45–52. https://doi.org/10.1016/j.gene.2007.01.019

Navrátilová A, Koblížková A, Macas J (2008) Survey of extrachromo-somal circular DNA derived from plant satellite repeats. BMC PlantBiol 8:90. https://doi.org/10.1186/1471-2229-8-90

Novák P, Hribová E, Neumann P, Koblízková A, Dolezel J, Macas J(2014) Genome-wide analysis of repeat diversity across the familymusaceae. PLoS One 9(6):e98918 doi:.1371/journal.pone.0098918

Novák P, Neumann P, Pech J, Steinhaisl J, Macas J (2013)RepeatExplorer: a galaxy-based web server for genome-wide char-acterization of eukaryotic repetitive elements from next-generationsequence reads. Bioinformatics 29:792–793. https://doi.org/10.1093/bioinformatics/btt054

Palomeque T, Lorite P (2008) Satelite DNA in insects: a review. Heredity100:564–573. https://doi.org/10.1038/hdy.2008.24

Plohl M, Luchetti A, Meštrović N, Mantovani B (2008) Satellite DNAsbetween selfishness and functionality: structure, genomics and evo-lution of tandem repeats in centromeric (hetero)chromatin. Gene409:72–82. https://doi.org/10.1016/j.gene.2007.11.013

PlohlM,MeštrovićN, Bruvo B,Ugarkovic (1998) Similarity of structuralfeatures and evolution of satellite DNAs from Palorus subdepressus(Coleoptera) and related species. J Mol Evol 46:234–239

Plohl M, Meštrović N, Mravinac B (2012) Satellite DNA evolution. In:Garrido-Ramos MA (ed) Repetitive DNA, vol 7. Karger, Basel, pp126–152. https://doi.org/10.1159/000337122

Plohl M, Meštrović N, Mravinac B (2014) Centromere identity from theDNA point of view. Chromosoma 123:313–325. https://doi.org/10.1007/s00412-014-0462-0

Rossi MS, Reig OA, Zorzópulos J (1990) Evidence for rolling-circlereplication in a major satellite DNA from the South American ro-dents of the genus Ctenomys. Mol Biol Evol 7:340–350

Ruiz-Estévez M, Cabrero J, Camacho JPM, López-León MD (2014) Bchromosomes in the grasshopper Eyprepocnemis plorans are pres-ent in all body parts analyzed and show extensive variation forrDNA copy number. Cytogenet Genome Res 143:268–274.https://doi.org/10.1159/000365797

Chromosoma

Page 18: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Ruiz-Ruano FJ, Cabrero J, López-León MD, Camacho JPM (2017)Satellite DNA content illuminates the ancestry of a supernumerary(B) chromosome. Chromosoma 126:487–500. https://doi.org/10.1007/s00412-016-0611-8

Ruiz-Ruano FJ, Cuadrado Á, Montiel EE, Camacho JPM, López-LeónMD (2015) Next generation sequencing and FISH reveal unevenand nonrandom microsatellite distribution in two grasshopper ge-nomes. Chromosoma 124:221–234. https://doi.org/10.1007/s00412-014-0492-7

Ruiz-Ruano FJ, López-LeónMD,Cabrero J, Camacho JPM (2016)High-throughput analysis of the satellitome illuminates satellite DNA evo-lution. Sci Rep 6:28333. https://doi.org/10.1038/srep28333

Ruiz-Ruano FJ, Ruiz-Estévez M, Rodríguez-Pérez J, López-Pino JL,Cabrero J, Camacho JPM (2011) DNA amount of X and B chromo-somes in the grasshoppers Eyprepocnemis plorans and Locustamigratoria. Cytogenet Genome Res 134:120–126. https://doi.org/10.1159/000324690

Santos JL, Arana P, Giráldez R (1983) Chromosome banding patterns inSpanish Acridoidea. Genetica 61:65–74. https://doi.org/10.1007/BF00563233

Schmieder R, Edwards R (2011) Fast identification and removal of se-quence contamination from genomic and metagenomic datasets.PLoS One 6:e17288. https://doi.org/10.1371/journal.pone.0017288

Schueler MG, Higgins AW, RuddMK, Gustashaw K, Willard HF (2001)Genomic and genetic definition of a functional human centromere.Science 294:109–115. https://doi.org/10.1126/science.1065042

Smit AFA, Hubley R, Green P (2013) RepeatMasker Open-4.0. <http://www.repeatmasker.org>

Song H, Amédégnato C, Cigliano MM, Desutter-Grandcolas L, HeadsSW, Huang Y, Otte D, Whiting MF (2015) 300 million years ofdiversification: elucidating the patterns of orthopteran evolutionbased on comprehensive taxon and gene sampling. Cladistics 31:621–651. https://doi.org/10.1111/cla.12116

Stephan W, Cho S (1994) Possible role of natural selection in the forma-tion of tandem- repetitive noncoding DNA. Genetics 136:333–341

Suja JA, Antonio C, González-García JM, Rufas JS (1993)Supernumeary heterochromatin segments associated with nucleolar

chromosomes of Pyrgomorpha conica (Orthoptera) contain methyl-ated rDNA sequences. Chromosoma 102:491–499. https://doi.org/10.1007/BF00357105

Sujiwattanarat P, Thapana W, Srikulnath K, Hirai Y, Hirai H, Koga A(2015) Higher-order repeat structure in alpha satellite DNA occursin NewWorldmonkeys and is not confined to hominoids. Sci Rep 5:10315. https://doi.org/10.1038/srep10315

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011)MEGA5: molecular evolutionary genetics analysis using maximumlikelihood, evolutionary distance, and maximum parsimonymethods. Mol Biol Evol 28:2731–2739. https://doi.org/10.1093/molbev/msr121

Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative genomicsviewer (IGV): high-performance genomics data visualization andexploration. Brief Bioinform 14:178–192. https://doi.org/10.1093/bib/bbs017

UgarkovićD, PlohlM (2002) Variation in satellite DNA profiles—causesand effects. EMBO J 21:5955–5959. https://doi.org/10.1093/emboj/cdf612

Untergrasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, RemmM,Rozen SG (2012) Primer3—new capabilities and interfaces. NucleicAcids Res 40:e115. https://doi.org/10.1093/nar/gks596

Van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C (2014) Ten years ofnext generation sequencing technology. Trends Genet 30:418–426.https://doi.org/10.1016/j.tig.2014.07.001

Walsh JB (1987) Persistence of tandem arrays: implications for satelliteand simple-sequence DNAs. Genetics 115:553–567

Warburton PE, Willard HF (1990) Genomic analysis of sequence varia-tion in tandemly repeated DNA. Evidence for localized homoge-neous sequence domains within arrays of alpha-satellite DNA. JMol Biol 216:3–16

Willard HF, Waye JS (1987) Chromosome-specific subsets of humanalpha satellite DNA: analysis of sequence divergence within andbetween chromosomal subsets and evidence for an ancestralpentameric repeat. J Mol Evol 25:207–214. https://doi.org/10.1007/BF02100014

Chromosoma

Page 19: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Chromosoma

Supplementary Figures for

High throughput analysis of satellite DNA in the grasshopper Pyrgomorpha

conica reveals abundance of homologous and heterologous higher-order

repeats

Francisco J. Ruiz-Ruano1, Jesús Castillo-Martínez1, Josefa Cabrero1, Ricardo Gómez2,

Juan Pedro M. Camacho1 and María Dolores López-León1

1Departamento de Genética, Facultad de Ciencias, Universidad de Granada, Granada, Spain

2Departamento de Ciencia y Tecnología Agroforestal, E.T.S. de Ingenieros Agrónomos, Universidad

de Castilla La Mancha, 02071 Albacete, Spain

Present Address: Jesús Castillo-Martínez. Facultad de Medicina. Universidad Católica de Valencia.

C/Quevedo 2, 46001 Valencia, Spain

*Corresponding author: María Dolores López-León

E-mail: [email protected]; Telephone number: 34958249702; Fax number: 34958244073

This PDF includes:

Supplementary Figures S1-S7

Page 20: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Bowtie2

Reference

RepeatMasker

25+25nt

junctions

A1 - B1 - A2 B1 - A2 - B2

TRF

Repeat

Masker

A1-B1-A2-B2

a

b

A1-B1-A2 B1-A2-B2

fastq-join

A1

B1

A2

A1 B1 A2

B1

A2

B2

B1 A2 B2

MEGA

RepeatMasker

a1 a2

a3 a5

a4

2,000

4,000

6,000

8,000

10,000

12,000

14,000

15,595

P. conica

mtDNABowtie2MITObim

b1 b2

b3 b4

Figure S1: Bioinformatic pipelines for homologous (a) and heterologous (b) HORs. (a) Weapplied the protocol for homologous HORs to the PcoSF11 superfamily. This protocol startswith a RepeatMasker-based selection of read pairs showing some homology with a satDNAdimer, as reference sequence, with A1-B1-A2-B2 subrepeat structure (a1). Then we joinedthe read pairs with fastq-join (a2) and aligned those showing homology with A1-B1-A2 orB1-A2-B2 subrepeats, using RepeatMasker (a3). We then calculated the average p-distancebetween the three pairwise subrepeat combinations, using MEGA (a4). Using the joinedread pairs, we searched for tandem repeat structure within them, using the TRF software,in order to infer the shortest tandem repeat unit (a5). (b) In the case of the PcoSF05 andTelomeric-PcoSat01A-176 herologous HORs, we first assembled the P. conica mitogenomeusing MITObim (b1) to got an estimate of insert size in our Illumina libraries (b2). UsingRepeatMasker, we then searched for read pairs including a 50 nt region coinciding with thejunction between both subrepeats (i.e. 25 nt each subrepeat), and also the same junctionat distance shorter than maximum insert size (b3). We then mapped these read pairs to areference with at least two HOR monomers using bowtie2 and represented it graphicallywith IGV.

Page 21: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Figure S2: Examples of chromosome distribution for satDNA families in P. conica. (a) Dis-tal location on several chromosome pairs (see Table 2); (b) distal location on X; (c) pericen-tromeric location on M6; (d) pericentromeric, interstitial and distal locations of PcoSat04-209on six chromosome pairs; (e) presence of an interstitial cluster for PcoSat04-209 on the L2bivalent (arrowhead); (f) Differential abundance of a same satDNA between chromosomepairs. Bar= 5 µm

Page 22: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

a

b

c

e

f

g

h

i

j

k

l

m

n

o

d

3 TG2. PcoSat43A-275

3 TG3. PcoSat43B-272

6 TG4. PcoSat35A-270

6 TG5. PcoSat35C-251

5 GGGGA4 TG6. PcoSat35B-273

5 TG7. PcoSat76-253

3 TG

1 20 40 60 80 100 120 140 160 180 200 220 240 260 280 293

Consensus

1. PcoSat43C-275

Identity

Figure S3: Alignments of the variants for 13 satDNA superfamilies (a-m) (i.e. all 15 SFsfound, excepting PcoSF05 and PcoSF11 which are shown in Figs. S4a and S5a, respectively).Alignments for two satDNA families showing several variants are also displayed here (nand o).

Page 23: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

PcoSat26A-236PcoSat26A-236

PcoSat44-73PcoSat44-73PcoSat44-73

1 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500

PcoSat39-165PcoSat39-165PcoSat39-165

PcoSat50A-73PcoSat50A-73PcoSat50-73

1 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 409

1 20 40 60 80 100 120 140 160 180 200 220 245

Consensus

PcoSat50A-73

PcoSat39A-165

PcoSat71A-150

PcoSat44A-73

PcoSat31A-198

PcoSat26A-236

Identity

4

2

21

2

PcoSat71-150

PcoSat31-198

PcoSat44-73

PcoSat26-236

PcoSat50-73

PcoSat39-165Deletion

Insertion

Substitution

a

b

c

d

PcoSF5 conserved region

PcoSat26 PcoSat31 PcoSat39 PcoSat44 PcoSat50 PcoSat71

PcoSat26

PcoSat31 0.2254

PcoSat39 0.2329 0.2535

PcoSat44 0.0274 0.2254 0.2055

PcoSat50 0.2466 0.2676 0.0411 0.2329

PcoSat71 0.2537 0.2985 0.2388 0.2388 0.2687

PcoSF5 variable region

PcoSat26 PcoSat31 PcoSat39 PcoSat71

PcoSat26

PcoSat31 0.4800

PcoSat39 0.4824 0.6087

PcoSat71 0.1972 0.4516 0.4923

e

Figure S4: (a) Alignment with the six components belonging to PcoSF05. Grey arrows corre-spond to the conserved region, and indicate full monomers when complete. Variable regionsare on the right (about nt 120 onwards). (b) Pairwise p-distances between all PcoSF05 fam-ilies for the variable (left) and conserved (right) regions. (c) Minimum spanning tree forthe conserved region of PcoSF05. (d) Junction mapping for PcoSat26-236. Note that theconserved-variable junction is in pink colour whereas the variable-conserved junction is inblue colour. (e) Junction mapping for PcoSat39-165.

Page 24: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Figure S5: (a) Alignment for the two satDNA families included in PcoSF11. Grey arrows in-dicate subrepeats. (b) Dotplots for PcoSat30-92 and PcoSat62-80. Note that subrepeat limitsare about nucleotide 47 in both satDNA families. (c) Pairwise p-distances (under diagonal)between consecutive (A1-B1 or B1-A2, etc) and alternate (A1-A2, B1-B2, etc.) subrepeats.Standard error estimates (above diagonal) were obtained by a bootstrap with 1000 repli-cates.

Page 25: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Sequence Lengths for 210 Nucleotide Sequences

0

10

20

30

40

50

60

Num

ber

of

Sequences

1030 10 20 30 40 50 60 70 80 90 100

Sequence Length

Sequence Lengths for 250 Nucleotide Sequences

0

50

100

150

200

250

Num

ber

of

Sequences

190 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Sequence Length

A GCTGTCTA CAAGCTGTCTA CA-A GCTGTCTA CA--A GCTGTCTA CA-AGCTGTCTA CA-A GCTGTCTA CA-A--- G- CTGTCTA CAA GCTGTCTACA-A

ConsensusIdentityIdentity

Consensus A GCTGTCTA CAAGCTGTCTGCTGA GATGA CTTCGT-- GCTGCCTGCCTAGCTGTCTA CA-A GCTGTCTGCMTA--- G- CTGWCTA CAA GCTGCCTGCSTAGCTGTCT-A CA T

1 10 20 30 40 50 60 70 80 90 100 112

11 nt

23 nt

34 nt

46 nt

56 nt

68 nt

80 nt

92 nt

102 nt

A GCTGTCTA CAAGCTGTCTA CA-A GCTGTCTA CA--A GCTGTCTA CA-AGCTGTCTA CA-A GCTGTCTA CA-A--- G- CTGTCTA CAA GCTGTCTACA-AGCTGTCT-A CA

A GCTGA CTA CAAGCTGCCTGCCTA GCTGA CTA CA--A GCTGCCTGCCTAGCTGACTA CA-A GCTGCCTGCCTA--- G- CTGACTA CAA GCTGCCTGCCTAGCTGACT-A CA

A GCTGTCTA CA TGCTGTCTTC- GTGCTGCCTGCCT-A GCTGTCTA CA T-GCTGTCTTCGT- GCTGCCTGCCTA--- G- CTGTCTA CA TGCTGTCTTCGT-GCTGCCT- GCCT

A GCTGTCTA CAAGCTGCCTGCTGA GATGA CTTCGT-- GCTGCCTGCCTAGCTGTCTA CA-A GCTGCCTGCTGA--- G-A TGACTTCGTGCTGCCTGCCTAGCTGTCT-A CA

A GCTGTCTA CA TGCTGTCTA CA-A GGA GA CTA CGT-- GCTGTCTTCGT-GCTGCCTGCCTA GCTGTCTA CA T---- G- CTGTCTA CAA GGAGA CTACGT-GCTGTCT- TCG

A GCTGTCTA CAAGCTGCCTGCTGA GATGA CTTCGT-- GCTG- CTGCCTAGCTGTCTA CA-A GCTGCCTGCCTA--- G- CTGTCTA CAA GCTGCCTGCTGAGA TGACT- TCG

A GCTGGCTGAAAGCTGCCTGCTGA GATGTCTTCGT-- GCTGCCTGCCTAGCTGTCTA CA T- GCTGTCTTCGTTCT- GCCTG---A CTA GCTGGCTGAA-AGCTGCCT- GCT

A GCTGTCTA CAAGCTGTTTGCTGA GATGA CTT-- TGA GCTGCCTGCCTAGCTGTCTA CA-A GCTGTTTGCA GA--- G-AA GATTTCTA GTTGCCTGCCTAGCTGTCT-A CA

A GCTGTCTA CAAGCTGTCTGCTGA GATGTCTTCGA-- GCTGCCTGCCTAGCTGTCTA CA- TGGTGTCTA CG-A--- G- GGGACTA CGA GCTGA CTACG-AGCTGCCTGA CG

1. 011nt -

2. 023nt -

3. 034nt -

4. 046nt -

Identity

a

b

c

Figure S6: Variation in monomer size in homologous HORs belonging to the PcoSF11 su-perfamily. (a) Bar plot showing the size of the tandem repeat units found by TRF under themost restrictive conditions. Note the regular distribution of unit sizes. (b) Bar plot showingthe size of the tandem repeat units found by TRF under the least restrictive conditions. Notethat the immense majority of read pairs showed a monomer of 11 nt, suggesting that thiscould be the size of the original monomer which gave rise to PcoSF11. (c) Alignment of allconsensus sequences obtained in the TRF searches under the experimental conditions in a

and b.

Page 26: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Figure S7: Physical mapping of telomeric DNA and PcoSat01-176. (a-b) Diplotene cell show-ing the presence of large clusters of telomeric DNA (Tel) on one or both homologous chro-mosomes from several bivalents (arrows). (c-d) Metaphase II cell showing the presence oflarge clusters of telomeric DNA on four chromosomes (arrows). Bar= 5 µm

Page 27: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Chromosoma

Supplementary Figures for

High throughput analysis of satellite DNA in the grasshopper Pyrgomorpha

conica reveals abundance of homologous and heterologous higher-order

repeats

Francisco J. Ruiz-Ruano1, Jesús Castillo-Martínez1, Josefa Cabrero1, Ricardo Gómez2,

Juan Pedro M. Camacho1 and María Dolores López-León1

1Departamento de Genética, Facultad de Ciencias, Universidad de Granada, Granada, Spain

2Departamento de Ciencia y Tecnología Agroforestal, E.T.S. de Ingenieros Agrónomos, Universidad

de Castilla La Mancha, 02071 Albacete, Spain

Present Address: Jesús Castillo-Martínez. Facultad de Medicina. Universidad Católica de Valencia.

C/Quevedo 2, 46001 Valencia, Spain

*Corresponding author: María Dolores López-León

E-mail: [email protected]; Telephone number: 34958249702; Fax number: 34958244073

This PDF includes:

Supplementary Tables S1 and S2

Page 28: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Table S1: DNA sequence of the primers used for PCR amplification of the 38 most abundantsatDNA families in P. conica.

SatDNA family F primer R primer

PcoSat01-176 TCGCTAGAACCATTGGTCGA ACTTGTATTCATCATTTCTATGCAAA

PcoSat02-156 TGTTTTATTTGCATATTTAGATCGCTA AAAAACTGCCATTACAGTCGT

PcoSat03-199 ACACACAGACAACAACCCTGT ACAAAGCTGCCAGGACGTAC

PcoSat04-209 TCAACTGTTTTATACTAAATATTCGCG GACAACTGTTTTACACTCTGAGAAG

PcoSat05-77 GCACCTGTTCGCTAGATTCAGA GCACAGAAAAATAGTAAAACTACTGTA

PcoSat06-168 ACTCCATGCTTGCCATGTAG TCGTCTATTTACTTGCGTTTACA

PcoSat07-194 TGACATGAACAGCTAGCGGA GCCAAGTTTTGTCATACTTTCTGTGC

PcoSat08-5-tel GGTTAGGTTAGGTTAGGTTA TAACCTAACCTAACCTAACC

PcoSat09-138 GCGAATAGAACACAATAAAAATCACTT CGCCTATCAGTCACGTCA

PcoSat10-109 ACATCAACTACAAAGAATTGTACAGA ACAGTGATCGCTAGGTTCAGT

PcoSat11-196 CGTGGCTATACAGAACACTGAG ACGAATCATCATACACGTGGT

PcoSat12-106 GACTTAATACAGATCCCAGTGC CACGTAAATGTAAGAACGTTTGG

PcoSat13-62 CCGTAGAGGCAATACCTGCAC CGGAGTGTTTTATATAAGTTGGCC

PcoSat14-75 GCTAGTTCAAAACCACGTGC AGCAAACTGTAACTAGCGAACCA

PcoSat15-251 GCAAGTCACACTCGATGAGTGGCA TCCGTGATTTCCTAGAAGGCCCA

PcoSat16-149 CGTTGGACACATTTATGTGGTGA ACGAGCCTGTTCGCTCAGTT

PcoSat17-146 TCCCTCTGTCAGACTGCCTG CTGGTCGTCAGTAGAAGCTCC

PcoSat18-139 TCAGGAAAGCAGAGGAGACTGGGA TCTGGTCTCGACAGCTCCGT

PcoSat19-151 TCCAGTTCATACTAATTGCCTAGCG GGATCAGTCAGGAGAAGCTCCT

PcoSat20-150 CGTCTGGCTCGTTGAACACACA ACGCTCTATTATGTTCATTGTGT

PcoSat21-113 GCAGAGTGCTACCTTTAGAGA GCTCTGTCCTCACACAGTTACT

PcoSat22-267 TGTGTCGAGTTGTTTTTGTAAC ACATTTGAGGAGAGATTGCTATTT

PcoSat23-286 ACACTCGACATAGTAGGTTACG ACTCGCCGCCTTCTTCAAAT

PcoSat24-161 TTCGCCATAAAATGATCATCACA CTCGTAGTTTCTTTGTACTGCA

PcoSat25-320 CTCATACCTCCAAAGCCTGT GAGAAGTGAATAAATCGTGGGTA

PcoSat26-236 ACGAGAGGGAGTACAGCTTC CGTGGAGACGATTTTCGTTTGT

PcoSat27-143 CAGAGTAGCTGGGATACTGAAGA GCTCCCTTGATCTCCTCAACTTCCC

PcoSat28-197 AGAAGAGAAAAGGAGTGGGTG TGTTATGGTCTCTTGTGGTGC

PcoSat29-154 ACATCGAAGATATAACAAGTCCTTT GAAGACAGGTAATATTGACGTGGC

PcoSat30-92 TACAAGCTGTTTGCTGAGATGA GGCAGGCAACTAGAAATCTTCTC

PcoSat31-198 ACACTGACACTGAGGGCGGC TCAGACACACAGTACATTAAAACA

PcoSat32-84 TGTGATGTTTGTGACAAGTCA ACACTTGAAAGGCTTCTCACC

PcoSat33-93 ATCTCAGGGACGACACAGT CCCAGAGGACGTAAACTACAAAG

PcoSat34-92 TCATTGCAGGGCTTACTTCACT TTCCTAGACTGCACTGCCAC

PcoSat35-270 TTCATTTCTCAGCTTTTCTGTTGTA GAAAAATACTTACAGAAGTGGGC

PcoSat36-159 AAAAATCAGGGGGATAACAGTGTT TATTTTACTGTGTGACAGATACATCT

PcoSat37-218 TAGGGGGCAGCACCGCTGTA ACACCAAGCTGAGGTTGCTACGA

PcoSat38-35 CGGCAGGACTGAAAGGGAACG GTCCAGCTGTTCACCGTTCCCT

Page 29: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

Table S2: Main features of the 87 satDNA variants foundin the grasshopper Pyrgomorpha conica. RUL= repeat unitlength, Abun= genomic abundance, Div= divergence.

SatDNA family RUL A+T (%) Abun (%) Div (%)

PcoSat01A-176 176 63.07 4.675 2.67PcoSat01B-168 168 63.69 0.994 7.14PcoSat02A-156 156 60.26 1.954 2.51PcoSat03A-199 199 52.76 0.164 6.47PcoSat04A-209 209 56.94 0.132 7.98PcoSat05A-77 77 63.64 0.131 12.85PcoSat06A-168 168 59.52 0.114 6.99PcoSat07A-194 194 62.89 0.111 20.79PcoSat08A-5-tel 5 60.00 0.099 2.7PcoSat09A-138 138 58.70 0.089 16.06PcoSat10A-109 109 63.30 0.078 19.27PcoSat11A-196 196 61.73 0.058 13.68PcoSat12A-106 106 60.38 0.038 7.13PcoSat13A-62 62 54.84 0.034 9.19PcoSat14A-75 75 65.33 0.033 7.64PcoSat15A-251 251 57.77 0.033 6.59PcoSat16A-149 149 59.73 0.033 21.34PcoSat17A-146 146 62.33 0.031 4.68PcoSat18A-139 139 45.32 0.022 14.25PcoSat18B-139 139 47.48 0.006 7.57PcoSat19A-151 151 62.91 0.027 5.49PcoSat20A-150 150 58.67 0.026 17.01PcoSat21A-113 113 63.72 0.025 3.67PcoSat22A-267 267 67.04 0.023 7.21PcoSat23A-286 286 61.89 0.023 8.83PcoSat24A-161 161 64.60 0.021 5.34PcoSat25A-320 320 62.19 0.020 20.47PcoSat26A-236 236 61.44 0.020 6.89PcoSat27A-143 143 50.35 0.019 17.96PcoSat28A-197 197 49.24 0.017 7.25PcoSat29A-154 154 68.83 0.016 8.03PcoSat30A-92 92 51.09 0.015 18PcoSat31A-198 198 64.14 0.015 7.26PcoSat32A-84 84 60.71 0.014 15.73PcoSat33A-93 93 60.22 0.013 12.04PcoSat34A-92 92 55.43 0.013 10.71PcoSat35A-270 270 62.22 0.007 8.14PcoSat35B-273 273 61.17 0.003 2.39PcoSat35C-251 251 60.56 0.001 7.63PcoSat36A-159 159 62.89 0.011 9.32PcoSat37A-218 218 48.17 0.006 10.66PcoSat37B-218 218 52.75 0.005 13.17PcoSat38A-35 35 40.00 0.008 17.85PcoSat38B-35 35 42.86 0.003 16.81PcoSat39A-165 165 57.58 0.010 2.61PcoSat40A-152 152 59.87 0.010 6.46

Page 30: High-throughput analysis of satellite DNA in the grasshopper … · 2018. 3. 19. · ORIGINAL ARTICLE High-throughput analysis of satellite DNA in the grasshopper Pyrgomorpha conica

SatDNA family RUL A+T (%) Abun (%) Div (%)

PcoSat41A-145 145 71.72 0.009 8.73PcoSat42A-115 115 72.17 0.009 4.9PcoSat43A-275 275 61.09 0.003 3.57PcoSat43B-272 272 59.93 0.003 5.66PcoSat43C-275 275 62.18 0.002 6.36PcoSat44A-73 73 67.12 0.009 7.95PcoSat45A-188 188 61.70 0.005 16.18PcoSat45B-186 186 63.44 0.004 18.94PcoSat46A-92 92 40.22 0.008 17.88PcoSat47A-149 149 61.74 0.008 11.06PcoSat48A-70 70 72.86 0.008 7.07PcoSat49A-169 169 45.56 0.008 5.04PcoSat50A-73 73 65.75 0.007 2.25PcoSat51A-186 186 52.15 0.007 3.92PcoSat52A-158 158 49.37 0.007 6.1PcoSat53A-163 163 64.42 0.007 11.4PcoSat54A-215 215 57.21 0.007 6.88PcoSat55A-127 127 59.06 0.006 5.24PcoSat56A-215 215 60.00 0.006 7.76PcoSat57A-243 243 43.62 0.006 11.32PcoSat58A-76 76 77.63 0.006 10.04PcoSat59A-122 122 63.93 0.006 6.73PcoSat60A-122 122 49.18 0.005 4.72PcoSat60B-119 119 48.74 0.000 8.28PcoSat61A-89 89 50.56 0.005 16.89PcoSat62A-80 80 43.75 0.004 16.7PcoSat63A-226 226 52.21 0.004 9.56PcoSat64A-181 181 65.19 0.004 3.78PcoSat65A-150 150 52.67 0.004 4.25PcoSat66A-109 109 59.63 0.004 9.15PcoSat67A-134 134 55.22 0.002 5.24PcoSat67B-138 138 55.80 0.002 2.01PcoSat68A-134 134 51.49 0.004 5.3PcoSat69A-105 105 50.48 0.003 4.51PcoSat70A-149 149 60.40 0.003 3.18PcoSat71A-150 150 60.00 0.003 3.57PcoSat72A-38 38 60.53 0.003 11.03PcoSat73A-80 80 65.00 0.003 10.63PcoSat74A-35 35 48.57 0.003 16.04PcoSat75A-126 126 58.73 0.002 8.94PcoSat76A-253 253 63.24 0.001 4.3