1 ancient exapted transposable elements drive nuclear ...64 recruits anril to target genes (holdt et...
TRANSCRIPT
![Page 1: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/1.jpg)
1
Ancient exapted transposable elements drive nuclear localisation of lncRNAs 1
2
Joana Carlevaro-Fita1,2,4 3
Taisia Polidori1,2,4 4
Monalisa Das1,2,4 5
Carmen Navarro3 6
Rory Johnson1,2* 7
8
9
10
1. Department of Biomedical Research (DBMR), University of Bern, 3008 Bern, Switzerland 11
2. Department of Medical Oncology, Inselspital, University Hospital and University of Bern, 3010 Bern, 12
Switzerland 13
3. Department of Computer Science and Artificial Intelligence, University of Granada, Spain 14
4. Equal contribution 15
16
*Correspondence: [email protected] 17
Keywords: Transposable element; long noncoding RNA; lncRNA; evolution; exaptation. 18
19
20
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 2: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/2.jpg)
2
Abstract 21
22
The sequence domains underlying long noncoding RNA (lncRNA) activities, including their 23
characteristic nuclear localisation, remain largely unknown. One hypothetical source of such 24
domains are neofunctionalised fragments of transposable elements (TEs), otherwise known as RIDLs 25
(Repeat Insertion Domains of Long Noncoding RNA). We here present a transcriptome-wide map of 26
putative RIDLs in human, using evidence from insertion frequency, strand bias and evolutionary 27
conservation of sequence and structure. In the exons of GENCODE v21 lncRNAs, we identify a total 28
of 5374 RIDLs in 3566 gene loci. RIDL-containing lncRNAs are over-represented in functionally-29
validated and disease-associated genes. By integrating this RIDL map with global lncRNA 30
subcellular localisation data, we uncover a cell-type-invariant and dose-dependent relationship 31
between LINE2 and MIR elements and nuclear-cytoplasmic distribution. Experimental validations 32
establish causality, showing that these RIDLs drive the nuclear localisation of their host lncRNA. 33
Together these data represent the first map of TE-derived functional domains, and suggest a global 34
role for TEs as nuclear localisation signals of lncRNAs. 35
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 3: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/3.jpg)
3
Introduction 36
The human genome contains many thousands of long noncoding RNAs, of which at least a fraction 37
are likely to have evolutionarily-selected biological functions (Ulitsky and Bartel 2013). Our current 38
thinking is that, similar to proteins, lncRNAs functions are encoded in primary sequence through 39
“domains”, or discrete elements that mediate specific aspects of lncRNA activity, such as molecular 40
interactions or subcellular localisation (Guttman and Rinn 2012; Rory Johnson and Guigó 2014; Mercer 41
and Mattick 2013). Mapping these domains is thus a key step towards the understanding and prediction of 42
lncRNA function in healthy and diseased cells. 43
One possible source of lncRNA domains are transposable elements (TEs) (Rory Johnson and Guigó 44
2014). TEs have been a major contributor to genomic evolution through the insertion and 45
neofunctionalisation of sequence fragments – a process known as exaptation (Bourque 2009)(Feschotte 46
2008). This process has contributed to the evolution of diverse features in genomic DNA, including 47
transcriptional regulatory motifs (Bourque et al. 2008; R. Johnson et al. 2006), microRNAs (Roberts, 48
Cardin, and Borchert 2014), gene promoters (Faulkner et al. n.d.; Huda et al. 2011), and splice sites (Lev-49
Maor et al. 2003; Sela et al. 2007). 50
We recently proposed that TEs may contribute pre-formed functional domains to lncRNAs. We 51
termed these “RIDLs” – Repeat Insertion Domains of Long noncoding RNAs (Rory Johnson and Guigó 52
2014). As RNA, TEs interact with a rich variety of proteins, meaning that in the context of lncRNA they 53
could plausibly act as protein-docking sites (Blackwell et al. n.d.). Diverse evidence also points to repetitive 54
sequences forming intermolecular Watson-Crick RNA:RNA and RNA:DNA hybrids (Gong and Maquat 55
2011; Holdt et al. 2013; Rory Johnson and Guigó 2014). However, it is likely that bona fide RIDLs represent 56
a small minority of the many exonic TEs, with the remainder being phenotypically-neutral “passengers”. 57
A small but growing number of RIDLs have been described, reviewed in (Rory Johnson and Guigó 58
2014). These are found in lncRNAs with clearly demonstrated functions, including the X-chromosome 59
silencing transcript XIST (Elisaphenko et al. 2008), the oncogene ANRIL (Holdt et al. 2013) and the 60
regulatory antisense UchlAS (Carrieri et al. 2012). In each case, domains of repetitive origin are necessary 61
for a defined function: the structured A-repeat of XIST, of retroviral origin, recruits the PRC2 silencing 62
complex (Elisaphenko et al. 2008); Watson-Crick hybridisation between RNA and DNA Alu elements 63
recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 64
of its sense mRNA (Carrieri et al. 2012). In parallel, a number of transcriptome-wide maps of lncRNA-65
linked TEs have been presented, showing how TEs have contributed extensively to lncRNA gene evolution 66
(Kapusta et al. 2013; D. Kelley and Rinn 2012)(Schmitt et al. 2016)(Hezroni et al. 2015). However, there 67
has been no attempt so far to create passenger-free maps of RIDLs with evidence for functionality in the 68
context of mature lncRNA molecules. 69
Subcellular localisation, and the domains controlling it, exert a powerful influence over lncRNA 70
functions (reviewed in (L.-L. Chen 2016)). For example, transcriptional regulatory lncRNAs must be 71
located in the nucleus and chromatin, while those regulating microRNAs or translation must be present in 72
the cytoplasm (K. Zhang et al. 2014). This is reflected in diverse patterns of lncRNA localization (Cabili et 73
al. 2015; Mas-Ponte et al. 2017b). If lessons learned from mRNA are also valid for lncRNAs, then short 74
sequence motifs binding to RNA binding proteins (RBPs) will be an important mechanism (Martin and 75
Ephrussi 2009). This was recently demonstrated for the BORG lncRNA, where a pentameric motif was 76
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 4: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/4.jpg)
4
shown to mediate nuclear retention (B. Zhang et al. 2014). Most interestingly, another study implicated an 77
inverted pair of Alu elements in nuclear retention of lincRNA-P21 (Chillón and Pyle 2016). This raises the 78
possibility that RIDLs may function as regulators of lncRNA localisation. 79
In the present study, we create a human transcriptome-wide catalogue of putative RIDLs. We show 80
that lncRNAs carrying these RIDLs are enriched for functional genes. Finally, we provide in silico and 81
experimental evidence that certain RIDL types, derived from ancient transposable elements, confer a potent 82
nuclear localisation on their host transcripts. 83
84
85
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 5: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/5.jpg)
5
Materials and Methods 86
All operations were carried out on human genome version GRCh38/hg38, unless stated otherwise. 87
88
Exonic TE curation 89
RepeatMasker annotations were downloaded from the UCSC Genome Browser (version hg38) on 90
December 31st 2014, and GENCODE version 21 lncRNA GTF annotations were downloaded from 91
www.gencodegenes.org (Harrow et al. 2012). We developed a pipeline, called Transposon.Profile, to 92
annotate exonic and intronic TEs of a given gene annotation. Transposon.Profile is largely based on 93
BEDTools’ intersect and merge functionalities(Quinlan and Hall 2010). Transposon.Profile merges the 94
exons of all transcripts belonging to the given gene annotation, henceforth referred to as “exons”. The set 95
of introns was curated by subtracting the merged exonic sequences from the full gene spans, and only 96
retaining those introns that belonged to a single gene. These intronic regions were assigned the strand of 97
the host gene. 98
The RepeatMasker annotation file was then compared to these exons, and intersections are recorded 99
and then classified into one of 6 categories: TSS (transcription start site), overlapping the first exonic 100
nucleotide of the first exon; splice acceptor, overlapping exon 5’ end; splice donor, overlapping exon 3’ 101
end; internal, residing within an exon and not overlapping any intronic sequence; encompassing, where an 102
entire exon lies within the TE; TTS (transcription termination site), overlapping the last nucleotide of the 103
last exon. In every case, the TEs are separated by strand relative to the host gene: + where both gene and 104
TE are annotated on the same strand, otherwise -. The result is the “Exonic TE Annotation” (Supplementary 105
File S1). 106
107
RIDL identification 108
Using this Exonic TE Annnotation, we identified the subset of those individual TEs with evidence 109
for functionality. For certain analysis, an equivalent Intronic TE Annotation was also employed, being the 110
Transposon.Profile output for the equivalent intron annotation described above. Three different types of 111
evidence were used: enrichment, strand bias and evolutionary conservation. 112
In enrichment analysis, the exon/intron ratio of the fraction of nucleotide coverage by each repeat 113
type was calculated. Any repeat type that has >2-fold exon/intron ratio was considered as a candidate. All 114
exonic TE instances belonging to such TE types are defined as RIDLs. 115
In strand bias analysis, a subset of Exonic TE Annotation was used, being the set of non-splice 116
junction crossing TE instances (“noSJ”). This additional filter was employed to guard against false positive 117
enrichments for TEs known to provide splice sites (Lev-Maor et al. 2003; Sela et al. 2007). For all TE 118
instances, the “relative strand” was calculated: positive, if the annotated TE strand matches that of the host 119
transcript; negative, if not. Then for every TE type, the ratio of relative strand sense/antisense was 120
calculated. Statistical significance was calculated empirically: entire gene structures were randomly re-121
positioned in the genome using bedtools shuffle, and the intersection with the entire RepeatMasker 122
annotation was re-calculated. For each iteration, sense/antisense ratios were calculated for all TE types. A 123
TE type was considered to have significant strand bias, if its true ratio exceeded (positively or negatively) 124
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 6: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/6.jpg)
6
all of 1000 simulations. All exonic instances of these TE types that also have the appropriate sense or 125
antisense strand orientation are defined as RIDLs. 126
In evolutionary analysis, four different annotations of evolutionarily-conserved regions were 127
treated similarly, using unfiltered Exonic TE Annotations. Primate, Placental Mammal and Vertebrate 128
PhastCons elements based on 46-way alignments were downloaded as BED files from UCSC Genome 129
Browser(Siepel et al. 2005)s, while the ECS conserved regions from obtained from Supplementary Data of 130
Smith et al (Smith et al. 2013). For each TE type we calculated the exonic/intronic conservation ratio. To 131
do this we used IntersectBED (Quinlan and Hall 2010) to overlap exonic locations with TEs, and calculate 132
the total number of nucleotides overlapping. We performed a similar operation for intronic regions. Finally, 133
we calculated the following ratio: 134
rei=(nt of overlap between conserved elements and exons)/(nt of overlap between conserved elements and 135
introns) 136
To estimate the background, the exonic and intronic BED files were positionally randomized 1000 times 137
using bedtools shuffle, each time recalculating rei. We considered to be significantly conserved those TE 138
types where the true rei was greater or less than every one of 1000 randomised rei values. All exonic instances 139
of these TE types that also intersect the appropriate evolutionary conserved element are defined as RIDLs. 140
All RIDL predictions were then merged using mergeBED and any instances with length <10 nt 141
were discarded. The outcome, a BED format file with coordinates for hg38, is found in Supplementary File 142
S2. 143
144
RIDL orthology analysis 145
In order to assess evolutionary history of RIDLs, we used chained alignments of human to chimp 146
(hg19ToPanTro4), macaque (hg19ToRheMac3), mouse (hg19ToMm10), rat (hg19ToRn5), cow 147
(hg19ToBosTau7). Due to availability of chain files, RIDL coordinates were first converted from hg38 to 148
hg19. Orthology was defined by LiftOver utility used at default settings. 149
150
Comparing RIDL-carrying lncRNAs versus other lncRNAs 151
In order to test for functional enrichment amongst lncRNAs hosting RIDLs, we tested for statistical 152
enrichment of the following traits in RIDL- carrying lncRNAs compared to other lncRNAs (see below) by 153
Fisher's exact test: 154
A) Functionally-characterised lncRNAs: lncRNAs from GENCODE v21 that are present in lncRNAdb 155
(Quek et al. 2015). 156
B) Disease-associated genes: lncRNAs from GENCODE v21 that are present in at least in one of the 157
following databases or public sets: LncRNADisease (G. Chen et al. 2013), Lnc2Cancer (Ning et al. 158
2016), Cancer LncRNA Census (CLC) (Carlevaro-fita et al. bioRxiv doi: 10.1101/152769) 159
C) GWAS SNPs: We collected SNPs from the NHGRI-EBI Catalog of published genome-wide 160
association studies (Hindorff et al. 2009; Welter et al. 2014) (https://www.ebi.ac.uk/gwas/home). We 161
intersected its coordinates with lncRNA exons and promoters (promoters defined as 1000nt upstream 162
and 500nt downstream of GENCODE-annotated TSS). 163
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 7: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/7.jpg)
7
For defining a comparable set of “other lncRNAs” we sampled from the rest of GENCODE v21 164
lncRNAs a set of lncRNAs matching RIDL-carrying lncRNAs exonic length distribution. We performed 165
the subsampling using matchDistribution script: https://github.com/julienlag/matchDistribution. 166
167
Subcellular localisation analysis 168
Processed RNAseq data from human cell fractions were obtained from ENCODE in the form of RPKM 169
(reads per kilobase per million mapped reads) quantified against the GENCODE v19 annotation(Djebali et 170
al. 2012b; Mas-Ponte et al. 2017a). Only transcripts common to both the v21 and v19 annotations were 171
considered. For the following analysis only one transcript per gene was considered, defined as the one with 172
largest number of exons. Cytoplasmic/nuclear ratio expression for each transcript was defined as 173
(cytoplasm polyA+ RPKM)/(nuclear polyA+ RPKM), and only transcripts having non-zero values in both 174
were considered. For each RIDL type and cell type in turn, the cytoplasmic/nuclear ratio distribution of 175
RIDL-containing to non-RIDL-containing lncRNAs was compared using Wilcox test. Only RIDLs having 176
at least 3 host expressed transcripts in at least one cell type were tested. Resulting p-values were globally 177
adjusted to False Discovery Rate using the Benjamini-Hochberg method (Benjamini and Hochberg 1995). 178
179
Cell lines and reagents 180
Human cervical cancer cell line HeLa and human lung cancer cell line A549 were cultured in Dulbecco’s 181
Modified Eagle’s medium (Sigma-Aldrich, # D5671) supplemented with 10% FBS and 1% penicillin 182
streptomycin at 37ºC and 5 % CO2. Anti-GAPDH antibody (Sigma-Aldrich, # G9545) and anti-histone H3 183
antibody (Abcam, # ab24834) were used for Western blot analysis. 184
185
Gene synthesis and cloning of lncRNAs 186
The three lncRNA sequences (RP11-5407, Linc00173, RP4-806M20.4) containing wild-type RIDLs, 187
or corresponding mutated versions where RIDL sequence has been randomised (“Mutant”), were 188
synthesized commercially (BioCat GmbH). The sequences were cloned into pcDNA 3.1 (+) vector within 189
the NheI and XhoI restriction enzyme sites. The clones were checked by restriction digestion and Sanger 190
sequencing. The sequence of the wild type and mutant clones are provided in Supplementary File S3. 191
192
Sub-cellular fractionation 193
Each pair of the wild type and the corresponding mutant lncRNA clones were transfected 194
independently in separate wells of a 6 well plate. Transfections were carried out with 2 g of total plasmid 195
DNA in each well using Lipofectamine 2000. 48 h post-transfection, cells from each well were harvested, 196
pooled and re-seeded into a 10 cm dish and allowed to grow till 100% confluence. The nuclear and 197
cytoplasmic fractionation was carried out as described previously (Suzuki et al. 2010) with minor 198
modifications. In brief, cells from 10 cm dishes were harvested by scraping and washed with 1X ice cold 199
PBS. For fractionation, cell pellet was re-suspended in 900 μl of ice-cold 0.1% NP40 in PBS and triturated 200
7 times using a p1000 micropipette. 300 μl of the cell lysate was saved as the whole cell lysate. The 201
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 8: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/8.jpg)
8
remaining 600 μl of the cell lysate was centrifuged for 30 sec on a table top centrifuge and the supernatant 202
was collected as “cytoplasmic fraction”. 300 μl from the cytoplasmic supernatant was kept for RNA 203
isolation and the remaining 300 μl was saved for protein analysis by western blot. The pellet containing the 204
intact nuclei was washed with 1 ml of 0.1% NP40 in PBS. The nuclear pellet was re-suspended in 200 μl 205
1X PBS and subjected to a quick sonication of 3 pulses with 2 sec ON-2 sec OFF to lyse the nuclei and 206
prepare the “nuclear fraction”. 100 μl of nuclear fraction was saved for RNA isolation and the remaining 207
100 μl was kept for western blot. 208
209
RNA isolation and real time PCR 210
The RNA from each nuclear and cytoplasmic fraction was isolated using Quick-RNA MiniPrep kit 211
(ZYMO Research, # R1055). The RNAs were subjected to on-column DNAse I treatment and clean up 212
using the manufacturer’s protocol. For A549 samples, additional units of DNase were employed, due to 213
residual signal in –RT samples. The RNA from each fraction was converted to cDNA using GoScript 214
reverse transcriptase (Promega, # A5001) and random hexamer primers. The expression of each of the 215
individual transcripts was quantified by qRT-PCR (Applied Biosystems® 7500 Real-Time) using indicated 216
primers (Supplementary File S4) and GoTaq qPCR master mix (Promega, # A6001). Human GAPDH 217
mRNA and MALAT1 lncRNA were used as cytoplasmic and nuclear markers, respectively. 218
219
Western Blotting 220
The protein concentration of each of the fractions was determined, and equal amounts of protein (50 221
μg) from whole cell lysate, cytoplasmic fraction, and nuclear fraction were resolved on 12 % Tris-glycine 222
SDS-polyacrylamide gels and transferred onto polyvinylidene fluoride (PVDF) membranes (VWR, # 223
1060029). Membranes were blocked with 5% skimmed milk and incubated overnight at 4ºC with anti-224
GAPDH antibody as a cytoplasmic marker and anti p-histone H3 antibody as nuclear marker. Membranes 225
were washed with PBS-T (1X PBS with 0.1 % Tween 20) followed by incubation with HRP-conjugated 226
anti-rabbit or anti-mouse secondary antibodies respectively. The bands were detected using SuperSignal™ 227
West Pico chemiluminescent substrate (Thermo Scientific, # 34077). 228
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 9: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/9.jpg)
9
Results 229
The objective of this study was to create a map of repeat insertion domains of long noncoding 230
RNAs (RIDLs) and link them to lncRNA functions. We previously hypothesised that RIDLs could function 231
by mediating interactions between host lncRNA and DNA, RNA or protein molecules (Rory Johnson and 232
Guigó 2014) (Figure 1A). However, any attempt to map RIDLs is confronted with the problem that they 233
will likely represent a small minority amongst many non-functional “passenger” transposable elements 234
(TEs) residing in lncRNA exons (Figure 1B). Therefore, it is necessary to identify RIDLs by some signature 235
of selection in lncRNA exons. In this study we use three such signatures: exonic enrichment, strand bias, 236
and evolutionary conservation (Figure 1B). 237
238
A map of exonic TEs in Gencode v21 lncRNAs 239
Our first aim was to create a comprehensive map of TEs within the exons of GENCODE v21 human 240
lncRNAs (Figure 2A). Altogether 5,520,018 distinct TE insertions were intersected against 48684 exons 241
from 26414 transcripts of 15877 GENCODE v21 lncRNA genes, resulting in 46474 exonic TE insertions 242
in lncRNA (Figure 1B). Altogether 13121 lncRNA genes (82.6%) carry at least one exonic TE fragment in 243
their mature transcript. 244
We defined TEs residing in lncRNA introns to represent neutral background, since they do not 245
contribute to functionality of the mature processed transcript, but nevertheless should model any large-scale 246
genomic biases in TE abundance. Thus we also created a control TE intersection dataset with 31,004 247
Gencode lncRNA introns, resulting in 562,640 intron-overlapping TE fragments (Figure 2A). This analysis 248
is likely to be conservative, since some intronic sequences could contribute to unidentified lncRNA exons. 249
Comparing intronic and exonic TE data, we see that lncRNA exons are slightly depleted for TE 250
insertions: 29.2% of exonic nucleotides are of TE origin, compared to 43.4% of intronic nucleotides (Figure 251
2B), similar to previous studies (Kapusta et al. 2013). This may reflect selection against disruption of 252
lncRNA transcripts by TEs, and hence is evidence for general lncRNA functionality. 253
254
Contribution of TEs to lncRNA gene structures 255
TEs have contributed widely to both coding and noncoding gene structures by the insertion of 256
elements such as promoters, splice sites and termination sites (Sela et al. 2007). We next classified inserted 257
TEs by their contribution to lncRNA gene structure (Figure 2C,D). It should be borne in mind that this 258
analysis is dependent on the accuracy of underlying GENCODE annotations, which are often incomplete 259
at 5’ and 3’ ends (J. Lagarde, B. Uszczyńska et al, Manuscript In Press). Altogether 4993 (18.9%) transcripts 260
are promoted from within a TE, most often those of Alu and ERVL-MaLR classes (Figure 2E). 7497 261
(28.4%) lncRNA transcripts are terminated by a TE, most commonly by Alu, ERVL-MaLR and LINE1 262
classes. 8494 lncRNA splice sites (32.2%) are of TE origin, and 2681 entire exons are fully contributed by 263
TEs (10.1%) (Figure 2E). These observations support several known contributions of TEs to gene structural 264
features (Sela et al. 2007). Nevertheless, the most frequent case is represented by 22031 TEs that lie 265
completely within an exon and do not overlap any splice junction (“inside”). 266
267
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 10: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/10.jpg)
10
Evidence for selection on certain exonic TE types 268
This exonic TE map represents the starting point for the identification of RIDLs, defined as the 269
subset of TEs with evidence for functionality in the context of mature lncRNAs. In this and subsequent 270
analyses, TEs are grouped by type as defined by RepeatMasker. We utilise three distinct sources of evidence 271
for selection on TEs: exonic enrichment, strand bias and evolutionary conservation (Figure 1B). 272
We first asked whether particular TE types are more frequently found within lncRNA exons, 273
compared to the background represented by intronic sequence (D. Kelley and Rinn 2012). Thus we 274
calculated the ratio of exonic / intronic sequence coverage by TEs (Figure 3A). We found signals of 275
enrichment, >2-fold, for numerous repeat types, including endogenous retrovirus classes (HERVE-int, 276
HERVK9-int, HERV3-int, LTR12D) in addition to others such as ALR/Alpha, BSR/Beta and REP522. 277
Surprisingly, quite a number of simple repeats are also enriched in lncRNA, including GC-rich repeats. A 278
weaker but more generalized trend of enrichment is also observed for various MLT repeat classes. These 279
findings are consistent with previous analyses by Kelley and Rinn (D. Kelley and Rinn 2012). 280
Interestingly, presently-active LINE1 elements are depleted in lncRNA exons (Figure 3A). It is 281
possible that this reflects selection against disruption to normal gene expression, where numerous weak 282
polyadenylation signals lead to premature transcription termination when the LINE1 element lies on the 283
same strand as the overlapping gene (Perepelitsa-Belancio and Deininger 2003). Another explanation may 284
be low transcriptional processivity exhibited by the LINE1 ORF2 in the sense strand, leading to reduced 285
gene expression and consequent negative selection (Perepelitsa-Belancio and Deininger 2003). 286
In a similar way, we investigated whether exonic TEs display a strand preference relative to host 287
lncRNA (Rory Johnson and Guigó 2014). We took care to avoid a major source of bias: as shown above, 288
many TSS and splice sites of lncRNA are contributed by TE, and such cases would lead to clear strand bias 289
in the resulting exonic TE sequences. To avoid this, we removed any TE insertions that overlap an exon-290
intron boundary. We calculated the relative strand overlap of all remaining TEs in lncRNA exons. Statistical 291
significance was assessed by randomisation (see Materials and Methods) (Figure 3B). We carried out the 292
same operation for lncRNA introns (Supplementary Figure S1). In lncRNA exons, a number of TE types 293
are enriched in either sense or antisense. They are dominated by LINE1 family members, possibly for the 294
reasons mentioned below. Other significantly enriched TE types include LTR78, MLT1B, and MIRc 295
(Figure 3B). 296
In introns, we observed a widespread depletion of same-strand TE insertions (Supplementary 297
Figure S1). This effect is modest yet statistically significant, and is especially true for LINE1 elements, 298
possibly for reasons mentioned above. In contrast, almost no TE types were significantly enriched on the 299
sense strand in introns. 300
To test for TE type-specific conservation, we turned to two sets of predictions of evolutionarily-301
conserved elements. First, the widely-used PhastCons conserved elements, based on phylogenetic hidden 302
Markov model (Siepel et al. 2005) calculated on primate, placental mammal and vertebrate alignments; 303
second, the more recently published “Evoluationarily Conserved Structures” (ECS) set of conserved RNA 304
structures (Smith et al. 2013). Importantly, the PhastCons regions are defined based on sequence 305
conservation alone, while the ECS are defined by phylogenetic analysis of RNA structure evolution. 306
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 11: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/11.jpg)
11
To look for evidence of evolutionary conservation on exonic TEs, we calculated the frequency of 307
their overlap with evolutionarily conserved genomic elements compared to intronic TEs of the same type. 308
To assess statistical significance, we again used positional randomisation (see inset in Figure 3C). This 309
pipeline was applied independently to the PhastCons (Figure C3, Supplementary Figure S1B,C,D) and ECS 310
(Supplementary Figure S1) data. The majority of TEs do not exhibit significantly different conservation 311
when found in exons or introns (Grey points). Depending on conservation data used, the method detects 312
significant conservation for 17 to 40 TE types (Figure 3C). We also found a small number of TEs with high 313
evolutionary rates specifically in lncRNA exons, namely the young AluSz, AluSx and AluJb (PhastCons) 314
and L1M4c and AluSx1 (ECS) (coloured red in Figure 3C and Supplementary Figure S1). 315
316
An annotation of RIDLs 317
We next combined all TE classes with evidence of functionality into a draft annotation of RIDLs, 318
enriched for functional TE elements. This annotation combined altogether 99 TE types with at least one 319
types of selection evidence. For each TE / evidence pair, only those TE instances satisfying that evidence 320
were included. In other words, if MIRb elements were found to be associated with vertebrate PhastCons 321
elements, then only those instances of exonic MIRb elements overlapping such a PhastCons element would 322
be included in the RIDL annotation, and all other exonic MIRs would be excluded. An example is CCAT1 323
lncRNA oncogene: it carries three exonic MIR elements, of which one is defined as a RIDL based on its 324
overlapping a PhastCons element (Figure 4A). 325
After removing redundancy, the final RIDL annotation consists of 5374 elements, located within 326
3566 distinct lncRNA genes (Figure 3D). The most predominant TE families are MIR and L2 repeats, 327
representing 2329 and 1143 RIDLs (Figure 4B). The majority of both are defined based on evolutionary 328
evidence (Figure 4B). In contrast, RIDLs composed by ERV1, low complexity, satellite and simple repeats 329
families are more frequently identified due to exonic enrichment (Figure 4B). The entire RIDL annotation 330
is available in Supplementary File S2. 331
We also examined the evolutionary history of RIDLs. Using 6-mammal alignments, their depth of 332
evolutionary conservation could be inferred (Supplementary Figure S2). 12% of instances appear to be 333
Great Ape-specific, with no orthologous sequence beyond chimpanzee. 47% are primate-specific, while the 334
remaining 40% are identified in at least one non-primate mammal. The wide timeframe for appearance of 335
RIDLs is consistent with the wide diversity of TE types, from ancient MIR elements to presently-active 336
LINE1 (Jurka, Zietkiewicz, and Labuda 1995; Konkel, Walker, and Batzer 2010; Smith et al. 2013). 337
Instances of genomic TE insertions typically represent a fragment of the full consensus sequence. 338
We hypothesised that particular regions of the TE consensus will be important for RIDL activity, 339
introducing selection for these regions that would distinguish them from unselected, intronic copies. To test 340
this, we compared insertion profiles of RIDLs to intronic instances, for each TE type, and used the 341
correlation coefficient (CC) as a quantitative measure of similarity (Figure 4C and Supplementary File S5). 342
For 17 cases, a CC<0.9 points to possible selective forces acting on RIDL insertions. An example is the 343
macrosatellite SST1 repeat where RIDL copies in 41 lncRNAs show a strong preference inclusion of the 344
3’ end, in contrast to the general 5’ preference observed in introns (Figure 4C). This suggests a possible 345
functional relevance for the 1000-1500 nt region of the SST1 consensus. 346
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 12: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/12.jpg)
12
347
RIDL-carrying lncRNAs are enriched for functions and disease roles 348
We next looked for evidence to support the RIDL annotation by investigating the properties of 349
RIDL-carrying lncRNAs. We first asked whether RIDLs are randomly distributed amongst lncRNAs, or 350
else clustered in a smaller number of genes. Figure 4D shows that the latter is the case, with a significant 351
deviation of RIDLs from a random distribution. These lncRNAs carry a mean of 1.15 RIDLs / kb of exonic 352
sequence (median: 0.84 RIDLs/kb) (Supplementary Figure S3). 353
Are RIDL-lncRNAs more likely to be functional? To address this, we compared lncRNA genes 354
carrying one or more RIDLs, to a length-matched set of non-RIDL lncRNAs (Supplementary Figure S4). 355
RIDL-lncRNAs are (1) over-represented in the reference database for functional lncRNAs, lncRNAdb 356
(Quek et al. 2015) (Figure 4E) and (2) are significantly enriched for lncRNAs associated with cancer and 357
other diseases (Figure 4F). Finally, we found that RIDL-lncRNAs are enriched for trait/disease-associated 358
SNPs in both their promoter and exonic regions (Figure 4G). 359
In addition to CCAT1 (Figure 4A) (Nissan et al. 2012) are a number of deeply-studied RIDL-360
containing genes. XIST, the X-chromosome silencing RNA contains seven internal RIDL elements. As we 361
pointed out previously (Rory Johnson and Guigó 2014), these include an intriguing array of four similar 362
pairs of MIRc / L2b repeats. The prostate cancer-associated UCA1 gene has a transcript isoform promoted 363
from an LTR7c, as well as an additional internal RIDL, thereby making a potential link between cancer 364
gene regulation and RIDLs. The TUG1 gene, involved in neuronal differentiation, contains highly 365
evolutionarily-conserved RIDLs including Charlie15k and MLT1K elements (Rory Johnson and Guigó 366
2014). Other RIDL-containing lncRNAs include MEG3, MEG9, SNHG5, ANRIL, NEAT1, CARMEN1 and 367
SOX2OT. LINC01206, located adjacent to SOX2OT, also contains numerous RIDLs. A full list of RIDL-368
carrying lncRNAs can be found in Supplementary File S6. 369
370
Correlation between RIDLs and subcellular localisation of host transcript 371
The location of a lncRNA within the cell is of key importance to its molecular function (Cabili et 372
al. 2015; Derrien et al. 2012)(Mas-Ponte et al. 2017a). We next asked whether RIDLs might define lncRNA 373
localisation (Hacisuleyman et al. 2016; B. Zhang et al. 2014)(Chillón and Pyle 2016) (Figure 5A). 374
Beginning with normalised subcellular RNAseq data from the LncATLAS database (Mas-Ponte et al. 375
2017a), based on 10 ENCODE cell lines (Djebali et al. 2012a), we comprehensively tested each of the 99 376
RIDL types for association with localisation of their host transcript. 377
After correcting for multiple hypothesis testing, this approach identified four distinct TEs that 378
significantly correlate with nuclear localisation: L1PA16, L2b, MIRb and MIRc (Figure 5B). For example, 379
44 lncRNAs carrying L2b RIDLs have 6.9-fold higher relative nuclear/cytoplasmic ratio in IMR90 cells, 380
and this tendency is observed in six difference cell types (Figure 5B,C). Furthermore, the degree of nuclear 381
localisation increases in lncRNAs as a function of the number of RIDLs they carry (Figure 5D). We also 382
found a significant relationship between GC-rich elements and cytoplasmic enrichment across three 383
independent cell samples. The GC-RIDL containing lncRNAs have between 2 and 2.3-fold higher relative 384
expression in cytoplasm in these cells (Supplementary Figure S5). These data suggest that certain types of 385
RIDL may regulate the subcellular localisation of lncRNAs. 386
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 13: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/13.jpg)
13
387
RIDLs play a causative role in lncRNA nuclear localisation 388
To more directly test whether RIDLs play a causative role in nuclear localisation, we selected three 389
lncRNAs, based on: (i) presence of L2b, MIRb and MIRc RIDLs; (ii) moderate expression; (iii) nuclear 390
localisation, as inferred from RNAseq (Figure 6A,B and Supplementary Figure S6). Nuclear localisation 391
of these candidates was validated using qRTPCR (Figure 6C and Supplementary Figure S7). 392
We designed an assay to compare the localisation of transfected lncRNAs carrying wild-type 393
RIDLs, and mutated versions where RIDL sequence has been randomised (“Mutant”) (Figure 6D, full 394
sequences available in Supplementary File 3). Wild-type and Mutant lncRNAs were overexpressed in 395
cultured cells and their localisation evaluated by fractionation. Overexpressed copies were many fold 396
greater than endogenous lncRNAs (Supplementary Figure S8). Fractionation quality was verified by 397
Western blotting (Figure 6E) and qRT-PCR (Figure 6F), and we found negligible levels of DNA 398
contamination in cDNA samples (Supplementary Figure S9). 399
Comparison of Wild-Type and Mutant lncRNAs demonstrated a potent and consistent impact of 400
RIDLs on nuclear-cytoplasmic localisation: for all three candidates, loss of RIDL sequence resulted in 401
relocalisation of the host transcript from nucleus to cytoplasm (Figure 6F). Overexpressed Wild-Type 402
lncRNA copies had slightly lower levels of nuclear enrichment compared to endogenous transcripts, and 403
the two types were distinguished by primers matching a plasmid-encoded region (Supplementary File S4). 404
We repeated these experiments in another cell line, A549, and observed similar effects (Supplementary 405
Figure S7). These results demonstrate a role for L2b, MIRb and MIRc elements in nuclear localisation of 406
lncRNAs. 407
408
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 14: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/14.jpg)
14
Discussion 409
Recent years have seen a rapid increase in the number of annotated lncRNAs. However our 410
understanding of their molecular functions, and how such functions are encoded in primary RNA 411
sequences, lag far behind. Two recent conceptual developments offer hope for resolving the sequence-412
function code of lncRNAs: First, the idea that the subcellular localisation of lncRNAs is a readily 413
quantifiable characteristic that holds important clues to function; Second, that the abundant transposable 414
element (TE) content of lncRNAs may contribute to functionality. 415
In this study we have linked these two ideas, by showing that TEs can drive the nuclear localisation 416
of lncRNAs. A global correlation analysis of TEs and RNA localisation data revealed a handful of TEs, 417
most notably LINE2b, MIRb and MIRc, strongly correlate with the degree of nuclear retention of their host 418
transcripts. This correlation is observed in multiple cell types, and scales with the number of TEs present. 419
A causative link was established experimentally, confirming that the indicated TEs have a two- to four-fold 420
effect on transcript localisation. 421
These data are consistent with our previously-published hypothesis that exonic TE elements may 422
function as lncRNA domains. In this “RIDL hypothesis”, transposable elements are co-opted by natural 423
selection to form “Repeat Insertion Domains of LncRNA”, that is, fragments of sequence that confer 424
adaptive advantage through some change in the activity of their host lncRNA. We proposed that RIDLs 425
may serve as binding sites for proteins or other nucleic acids, and indeed a growing body of evidence 426
supports this (Rory Johnson and Guigó 2014). In the context of localisation, RIDLs could mediate nuclear 427
retention due to their described interactions with nuclear proteins (D. R. Kelley et al. 2014), although it is 428
also conceivable that the effect occurs through hybridisation to complementary repeats in genomic DNA. 429
The localisation RIDLs discovered – MIR and LINE2 - are both ancient and contemporaneous, 430
being active prior to the mammalian radiation (Cordaux and Batzer n.d.). Both have previously been 431
associated with acquired roles in the context of genomic DNA (Jjingo et al. 2014)(R. Johnson et al. 2006). 432
Although the evolutionary history of lncRNA remains an active area of research and accurate dating of 433
lncRNA gene birth is challenging, it appears that the majority of human lncRNAs were born after the 434
mammalian radiation (Hezroni et al. 2015)(Hezroni et al. 2017)(Necsulea et al. 2014)(Washietl, Kellis, and 435
Garber 2014). This would mean that MIR and LINE2 RIDLs were pre-existing sequences that were exapted 436
by newly-born lncRNAs. This corresponds to the “latent” exaptation model proposed by Feschotte and 437
colleagues(Chuong, Elde, and Feschotte 2017). Given that nuclear retention is at odds with the primary 438
needs of TE RNAs to be exported to the cytoplasm, we propose that the observed nuclear localisation 439
activity is a more modern feature of L2b/MIR RIDLs, which is unrelated to their original roles. However it 440
is also possible that for other cases the reverse could be true – a pre-existing lncRNA exapts a newly-441
inserted TE. Our use of evolutionary conservation to identify RIDLs likely biases our analysis against the 442
discovery of modern TE families, such as Alus. 443
This study marks a step in the ongoing efforts to map the domains of lncRNAs. Previous studies 444
have utilised a variety of approaches, from integrating experimental protein-binding data (Van Nostrand et 445
al. 2016)(Hu et al. 2017)(Li et al. 2014), to evolutionarily conserved segments (Smith et al. 2013)(Seemann 446
et al. 2017). Previous maps of TEs have highlighted their profound roles in lncRNA gene evolution 447
(Kapusta et al. 2013)(D. Kelley and Rinn n.d.)(Hezroni et al. 2015). However the present RIDL annotation 448
stands apart in attempting to identify the subset of TEs with evidence for selection. We hope that the present 449
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 15: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/15.jpg)
15
RIDL map will prove a resource for future studies to better understand functional domains of lncRNAs. 450
Although various evidence suggests that the RIDL annotation is a useful enrichment group of functional 451
TE elements, it most likely contains substantial false positive and false negative rates that will be improved 452
in future. 453
This study, and the concept of RIDLs in general, may help to address two longstanding questions 454
in the lncRNA field. 455
How does lncRNA primary sequence evolve new functions so rapidly? The insertion of TE-derived “pre-456
fabricated” sequence domains, with protein-, RNA- or DNA-binding capacity, confers rapid acquisition of 457
new molecular activities. There is a variety of evidence that TE sequences contain abundant, latent activity 458
of this type (reviewed in (Rory Johnson and Guigó 2014)). 459
Why do lncRNAs have nuclear-restricted localisation? Although they are readily detected in the cytoplasm, 460
it is nevertheless true that lncRNAs tend to have higher nuclear/cytoplasmic ratios compared to mRNAs 461
(Clark et al. 2012)(Ulitsky and Bartel 2013)(Derrien et al. 2012)(Mas-Ponte et al. 2017a). This is true across 462
various human and mouse cell types. Although this may partially be explained by decreased stability 463
(Mukherjee et al. 2017), it is likely that RNA sequence motifs also contribute to nuclear localisation 464
(Chillón and Pyle 2016)(B. Zhang et al. 2014). Here we show that this is the case, and that the enrichment 465
of certain RIDL types in lncRNA mature sequences is likely to be a major contributor to lncRNA nuclear 466
retention. 467
In summary therefore, we have made available a first annotation of selected RIDLs in lncRNAs, 468
and described a new paradigm for TE-derived fragments as drivers of nuclear localisation in lncRNAs. 469
470
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 16: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/16.jpg)
16
Figure Legends 471
472
Figure 1: Repeat insertion domains of lncRNAs (RIDLs). 473
(A) In the “Repeat Insertion Domain of LncRNAs” (RIDL) model, exonically-inserted fragments of 474
transposable elements contain pre-formed protein-binding (red), RNA-binding (green) or DNA-475
binding (blue) activities, that contribute to functionality of the host lncRNA (black). RIDLs are 476
likely to be a small minority of exonic TEs, coexisting with large numbers of non-functional 477
“passengers” (grey). 478
(B) RIDLs (dark orange arrows) will be distinguished from passenger TEs by signals of selection, 479
including: (1) simple enrichment in exons; (2) a preference for residing on a particular strand 480
relative to the host transcript; (3) evolutionary conservation. Selection might be identified by 481
comparing exonic TEs to a neutral population, for example those residing in lncRNA introns (light 482
coloured arrows). 483
484
Figure 2: An exonic transposable element annotation with the GENCODE v21 lncRNA catalogue. 485
(A) Statistics for the exonic TE annotation process using GENCODE v21 lncRNAs. 486
(B) The fraction of nucleotides overlapped by TEs for lncRNA exons and introns, and the whole 487
genome. 488
(C) Overview of the annotation process. The exons of all transcripts within a lncRNA gene annotation 489
are merged. Merged exons are intersected with the RepeatMasker TE annotation. Intersecting TEs 490
are classified into one of six categories (bottom panel) according to the gene structure with which 491
they intersect, and the relative strand of the TE with respect to the gene. 492
(D) Summary of classification breakdown for exonic TE annotation. 493
(E) Classification of TE classes in exonic TE annotation. Numbers indicate instances of each type. 494
495
Figure 3: Evidence for selection on transposable elements in lncRNA exons. 496
(A) Figure shows, for every TE type, the enrichment of per nucleotide coverage in exons compared to 497
introns (y axis) and overall exonic nucleotide coverage (x axis). Enriched TE types (at a 2-fold 498
cutoff) are shown in green. 499
(B) As for (A), but this time the y axis records the ratio of nucleotide coverage in sense vs antisense 500
configuration. “Sense” here is defined as sense of TE annotation relative to the overlapping exon. 501
Similar results for lncRNA introns may be found in Supplementary Figure 1. Significantly-enriched 502
TE types are shown in green. Statistical significance was estimated by a randomisation procedure, 503
and significance is defined at an uncorrected empirical p-value < 0.001 (See Material and Methods). 504
(C) As for (A), but here the y axis records the ratio of per-nucleotide overlap by PhastCons mammalian-505
conserved elements for exons vs introns. Similar results for three other measures of evolutionary 506
conservation may be found in Supplementary Figure 1. Significantly-enriched TE types are shown 507
in green. Statistical significance was estimated by a randomisation procedure, and significance is 508
defined at an uncorrected empirical p-value < 0.001 (See Material and Methods). An example of 509
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 17: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/17.jpg)
17
significance estimation is shown in the inset: the distribution shows the exonic/intronic 510
conservation ratio for 1000 simulations. The green arrow shows the true value, in this case for 511
MLT1A0 type. 512
(D) Summary of TE types with evidence of exonic selection. Six distinct evidence types are shown in 513
rows, and TE types in columns. On the right are summary statistics for (i) the number of unique TE 514
types identified by each method, and (ii) the number of instances of exonic TEs from each type 515
with appropriate selection evidence. The latter are henceforth defined as “RIDLs”. 516
517
Figure 4: Annotated RIDLs and RIDL-lncRNAs 518
(A) Example of a RIDL-lncRNA gene: CCAT1. Of note is that although several exonic TE instances 519
are identified (grey), including three separate MIR elements, only one is defined a RIDL (orange) 520
due to overlap of a conserved element. 521
(B) Breakdown of RIDL instances by TE family and evidence sources. 522
(C) Insertion profile of SST1 RIDLs (blue) and intronic insertions (red). x axis shows the entire 523
consensus sequence of SST1. y axis indicates the frequency with which each nucleotide position 524
is present in the aggregate of all insertions. “CC”: Spearman correlation coefficient of the two 525
profiles. “RIDLs” / “Intronic TEs” indicate the numbers of individual insertions considered for 526
RIDLs / intronic insertions, respectively. 527
(D) Number of lncRNAs (y axis) carrying the indicated number of RIDL (x axis) given the true 528
distribution (black) and randomized distribution (red). The 95% confidence interval was computed 529
empirically, by randomly shuffling RIDLs across the entire lncRNA annotation. 530
(E) The number of RIDL-lncRNAs, and a length-matched set of non-RIDL lncRNAs, which are 531
present in the lncRNAdb database of functional lncRNAs (Quek et al. 2015). Numbers denote 532
gene counts. Significance was calculated using Fisher’s exact test. 533
(F) As for (E) for lncRNAs present in disease- and cancer-associated lncRNA databases (see Materials 534
and Methods). 535
(G) As for (E) for lncRNAs containing at least one trait/disease-associated SNP in their promoter 536
regions (darker colors) or exonic regions (lighter colors) (see Materials and Methods). 537
Figure 5: Correlation between RIDLs and host lncRNA nuclear-cytoplasmic localisation. 538
(A) We hypothesized that RIDLs may govern the subcellular localization of host lncRNAs. 539
(B) Results of an in silico screen of RIDL types (rows) against lncRNA nuclear/cytoplasmic 540
localisation in human cell lines (columns). For every RIDL type (rows) in every cell type 541
(columns), the localisation of RIDL-containing lncRNAs was compared to all other detected 542
lncRNAs. Significantly-different pairs are coloured (Benjamini-Hochberg corrected p-value < 543
0.01; Wilcoxon test). Colour scale indicates the nuclear-cytoplasmic ratio mean of RIDL-lncRNAs. 544
Numbers in cells indicate the number of considered RIDL-lncRNAs. 545
(C) The nuclear-cytoplasmic localization of lncRNAs carrying L2b RIDLs in IMR90 cells. Blue 546
indicates lncRNAs carrying ≥1 L2b RIDLs, grey indicates all other detected lncRNAs (“not”). 547
Dashed lines represent medians. Significance was calculated using Wilcoxon test. 548
(D) The nuclear-cytoplasmic ratio of lncRNAs broken down by the number of significant RIDLs that 549
they carry (L1PA16, L2b, MIRb, MIRc). Correlation coefficient (Rho) between number of RIDLs 550
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 18: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/18.jpg)
18
and nuclear-cytoplasmic ratio of lncRNAs and the corresponding p-value (P) are indicated in the 551
plot. In each box, upper value indicates the number of lncRNAs, and lower value the median. 552
553
Figure 6: Disruption of RIDLs results in lncRNA relocalisation from nucleus to cytoplasm. 554
(A) Structures of candidate RIDL-lncRNAs. Orange indicates RIDL positions. For each RIDL, 555
numbers indicate the position within the TE consensus, and its orientation with respect to the 556
lncRNA is indicated by arrows (“>” for same strand, “<” for opposite strand). 557
(B) Experimental design 558
(C) Expression of the three lncRNA candidates as inferred from HeLa RNAseq (Djebali et al. 559
2012c). 560
(D) Nuclear/cytoplasmic localisation of endogenous candidate lncRNA copies in wild-type HeLa 561
cells, as measured by qRT-PCR. 562
(E) The purity of HeLa subcellular fractions was assessed by Western blotting against specific 563
markers. GAPDH / Histone H3 proteins are used as cytoplasmic / nuclear markers, respectively. 564
(F) Nuclear/cytoplasmic localisation of transfected candidate lncRNAs. GAPDH/MALAT1 are 565
used as cytoplasmic/nuclear controls, respectively. N indicates the number of biological 566
replicates, and error bars represent standard error of the mean. P-values for paired t-test (1 tail) 567
are shown. 568
569
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 19: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/19.jpg)
19
Supplementary Figure Legends 570
571
Supplementary Figure S1: (A) Figure shows, for every TE type, the ratio of intronic nucleotide coverage 572
in sense vs antisense configuration. “Sense” here is defined as sense of TE annotation relative to the sense 573
of the overlapping intron. Significantly-enriched TE types are shown in green. Statistical significance was 574
estimated by a randomisation procedure, and significance is defined at an uncorrected empirical p-value < 575
0.001 (See Material and Methods). (B-D) As for (A), but y axis records the ratio of nucleotide overlap in 576
exons vs introns by PhastCons Primate-conserved elements, PhastCons placental-conserved elements, 577
PhastCons vertebrate-conserved elements and Evolutionarily conserved structure, respectively. 578
Supplementary Figure S2: Evolutionary analysis using 6-mammal alignments. Rows represent human 579
RIDLs. In each column (6 different species) they are coloured in red when they are conserved in the given 580
species and otherwise in light yellow. 581
Supplementary Figure S3: Density distribution of RIDL-carrying lncRNAs based on the number of 582
RIDLs per kb of exons that they contain. Dotted line indicates median. 583
Supplementary Figure S4: Creating an exon-length-matched sample of non-RIDL lncRNAs. (A) Exonic 584
length distribution of RIDL-carrying and non-RIDL GENCODE v21 lncRNAs (“others”). (B) Exonic 585
length distribution of RIDL-carrying lncRNAs and a length-matched sample of non-RIDL lncRNAs. The 586
latter were used for all comparisons at gene level. 587
Supplementary Figure S5: Cytoplasmic-nuclear localization in IMR90. Yellow represents lncRNAs 588
carrying one or more copies of GC-rich RIDL elements, grey represents all other detected lncRNAs. Dashed 589
lines indicate the median of each group. 590
Supplementary Figure S6: Barplot generated by LncATLAS webserver (Mas-Ponte et al. 2017b). Bars 591
indicate log2-transformed cytoplasmic/nuclear expression ratios (CN RCI) in 15 cell lines for three 592
candidate genes. Colours represent the total nuclear expression of each gene in each cell line. 593
Supplementary Figure S7: Experimental data for A549 cells. (A) Expression data for three candidate 594
lncRNAs calculated using RNAseq data (Djebali et al. 2012a). (B) Validation of nuclear/cytoplasmic 595
localisation of candidate genes in wild-type cells by qRT-PCR. (C) Validation of subcellular fractionation 596
by Western blot, using GAPDH/Histone H3 proteins as cytoplasmic/nuclear markers, respectively. (D) 597
Nuclear/cytoplasmic localisation of transfected candidate lncRNAs. 598
Supplementary Figure S8: Estimating overexpression of transfected lncRNA candidates. Data are shown 599
individually for each of four biological replicates. y axis represents the fold change of transfected gene level 600
compared to the endogeneous level estimated from wild-type cells. 601
Supplementary Figure 9: DNA contamination check in RNA Isolated from different subcellular fractions. 602
Total RNA was isolated from different subcellular fractions using RNA isolation kit as described in 603
Materials and Methods. Equal amount of RNAs with (+RT) and without (-RT) reverse transcription, were 604
subjected to PCR (40 cycles) with an exonic primer. The reaction products were electrophoresed on a 1.5 605
% agarose gel stained with SYBR safe. The T, C, N represent the RNA isolated from total, cytoplasmic and 606
nuclear fractions respectively. 607
608
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 20: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/20.jpg)
20
Supplementary Files 609
610
Supplementary File S1: File is in BED format with coordinates for human genome assembly GRCh38 / 611
hg38. The file contains the coordinates of genomic nucleotides that overlap both TEs and merged exons of 612
lncRNA genes. 613
The 4th “name” column contains information on the TE (from RepeatMasker annotation) in comma-614
separated format: 615
Chromosome of TE annotation, start of TE annotation, end of TE annotation, strand of TE annotation, TE 616
name, TE class, TE family, left in repeat, end in repeat, begin in repeat, hypothetical start if TE were full 617
length, hypothetical end if TE were full length. 618
In the 5th “score” column is indicated the TE classification: 619
+1 : tss.plus ; +2 : splice_acc.plus ; +3 : encompass.plus ; +4 : splice_don.plus; +5 : tts.plus ; +6 : 620
inside.plus. The sign (+-) denotes whether the annotated strand of the TE lies on the same/opposite strand 621
with respect to that of the overlapping lncRNA. 622
Supplementary File S2: File is in BED format with coordinates for human genome assembly GRCh38 / 623
hg38. It was created by merging a subset of the exonic TEs from Supplementary File S1, and thus some 624
lines may be composed of merged features. 625
The 4th “name” column contains information on the TE (from RepeatMasker annotation) in comma-626
separated format: 627
Chromosome of TE annotation, start of TE annotation, end of TE annotation, strand of TE annotation, TE 628
name, TE class, TE family, left in repeat, end in repeat, begin in repeat, hypothetical start if TE were full 629
length, hypothetical end if TE were full length. 630
Supplementary File S3: File contains wild type and mutant synthesized plasmid sequences. Coloured 631
sequences indicate wild-type / scrambled RIDL sequences. 632
Supplementary File S4: Primer sequences and calculations of their efficiency. 633
Supplementary File S5: PDF images of insertion profiles of all RIDLs (blue) and intronic insertions (red). 634
x axis shows the entire consensus sequence of a given TE type. y axis indicates the frequency with which 635
each nucleotide position is present in the aggregate of all insertions. “CC”: Spearman correlation coefficient 636
of the two profiles. “Data” / “Reference” indicate the numbers of individual insertions considered for RIDLs 637
/ intronic insertions, respectively. 638
Supplementary File S6: List of GENCODE v21 lncRNAs carrying one or more RIDLs. First column 639
contains gene ID and second column the number of RIDLs present in the lncRNA. 640
641
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 21: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/21.jpg)
21
Acknowledgements 642
We acknowledge Deborah Re (DBMR), Silvia Roesselet (DBMR) and Marianne Zahn (Inselspital) 643
for administrative support. We wish to thank Roderic Guigo (CRG), Marc Friedlaender (SciLife Lab) and 644
Marta Mele (Harvard) for many helpful discussions and Roberta Esposito (DBMR) for help and advice 645
regarding experiments and their analysis. Samir Ounzain (CHUV) also contributed suggestions regarding 646
experimental design. Julien Lagarde (CRG) kindly provided help in gene sampling analysis. CN is 647
supported by grant TIN-2013-41990-R from the Spanish Ministry of Economy, Industry and 648
Competitiveness (MINECO). This research was supported by the NCCR RNA & Disease funded by the 649
Swiss National Science Foundation. 650
651
652
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 22: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/22.jpg)
22
References 653
654
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and 655
Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B 656
(Methodological) 57: 289–300. https://www.jstor.org/stable/2346101 (August 25, 2017). 657
Blackwell, Benjamin J et al. “Protein Interactions with piALU RNA Indicates Putative Participation of 658 retroRNA in the Cell Cycle, DNA Repair and Chromatin Assembly.” 659 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3383447/pdf/mge-2-26.pdf (August 25, 2017). 660
Bourque, Guillaume et al. 2008. “Evolution of the Mammalian Transcription Factor Binding Repertoire 661 via Transposable Elements.” Genome research 18(11): 1752–62. 662 http://www.ncbi.nlm.nih.gov/pubmed/18682548 (August 25, 2017). 663
———. 2009. “Transposable Elements in Gene Regulation and in the Evolution of Vertebrate Genomes.” 664 Current Opinion in Genetics & Development 19(6): 607–12. 665 http://www.sciencedirect.com/science/article/pii/S0959437X09001725 (August 25, 2017). 666
Cabili, Moran N et al. 2015. “Localization and Abundance Analysis of Human lncRNAs at Single-Cell 667 and Single-Molecule Resolution.” Genome biology 16(1): 20. 668 http://genomebiology.com/2015/16/1/20 (January 30, 2015). 669
Carrieri, Claudia et al. 2012. “Long Non-Coding Antisense RNA Controls Uchl1 Translation through an 670 Embedded SINEB2 Repeat.” Nature 491(7424): 454–57. 671 http://www.ncbi.nlm.nih.gov/pubmed/23064229 (February 10, 2015). 672
Chen, G. et al. 2013. “LncRNADisease: A Database for Long-Non-Coding RNA-Associated Diseases.” 673 Nucleic Acids Research 41(D1): D983–86. http://www.ncbi.nlm.nih.gov/pubmed/23175614 674 (December 24, 2016). 675
Chen, Ling-Ling. 2016. “Linking Long Noncoding RNA Localization and Function.” Trends in 676 biochemical sciences. http://www.ncbi.nlm.nih.gov/pubmed/27499234 (August 25, 2016). 677
Chillón, Isabel, and Anna M. Pyle. 2016. “Inverted Repeat Alu Elements in the Human lincRNA-p21 678 Adopt a Conserved Secondary Structure That Regulates RNA Function.” Nucleic Acids Research: 679 gkw599. http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkw599 (September 5, 2016). 680
Chuong, Edward B, Nels C Elde, and Cédric Feschotte. 2017. “Regulatory Activities of Transposable 681 Elements: From Conflicts to Benefits.” 682 https://www.nature.com/nrg/journal/v18/n2/pdf/nrg.2016.139.pdf (August 25, 2017). 683
Clark, Michael B et al. 2012. “Genome-Wide Analysis of Long Noncoding RNA Stability.” Genome 684 research 22(5): 885–98. http://genome.cshlp.org/content/22/5/885.long (June 3, 2014). 685
Cordaux, Richard, and Mark A Batzer. “The Impact of Retrotransposons on Human Genome Evolution.” 686 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2884099/pdf/nihms-201920.pdf (August 29, 2017). 687
Derrien, Thomas et al. 2012. “The GENCODE v7 Catalog of Human Long Noncoding RNAs: Analysis 688 of Their Gene Structure, Evolution, and Expression.” Genome research 22(9): 1775–89. 689 http://genome.cshlp.org/content/22/9/1775.long (May 23, 2014). 690
Djebali, Sarah et al. 2012a. “Landscape of Transcription in Human Cells.” Nature 489(7414): 101–8. 691 http://www.nature.com/doifinder/10.1038/nature11233 (April 20, 2017). 692
———. 2012b. “Landscape of Transcription in Human Cells.” Nature 489(7414): 101–8. 693 http://www.ncbi.nlm.nih.gov/pubmed/22955620 (May 11, 2017). 694
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 23: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/23.jpg)
23
———. 2012c. “Landscape of Transcription in Human Cells.” Nature 489(7414): 101–8. 695 http://www.ncbi.nlm.nih.gov/pubmed/22955620 (July 17, 2017). 696
Elisaphenko, Eugeny A et al. 2008. “A Dual Origin of the Xist Gene from a Protein-Coding Gene and a 697 Set of Transposable Elements.” 698 http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0002521&type=printable 699 (August 25, 2017). 700
Faulkner, Geoffrey J et al. “The Regulated Retrotransposon Transcriptome of Mammalian Cells.” 701 https://www.nature.com/ng/journal/v41/n5/pdf/ng.368.pdf (August 25, 2017). 702
Feschotte, Cédric. 2008. “Transposable Elements and the Evolution of Regulatory Networks.” Nature 703 reviews. Genetics 9(5): 397–405. http://www.ncbi.nlm.nih.gov/pubmed/18368054 (September 1, 704 2017). 705
Gong, Chenguang, and Lynne E Maquat. 2011. “lncRNAs Transactivate STAU1-Mediated mRNA Decay 706 by Duplexing with 3’ UTRs via Alu Elements.” Nature 470(7333): 284–88. 707 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3073508&tool=pmcentrez&rendertype=708 abstract (March 6, 2015). 709
Guttman, Mitchell, and John L. Rinn. 2012. “Modular Regulatory Principles of Large Non-Coding 710 RNAs.” Nature 482(7385): 339–46. http://www.nature.com/doifinder/10.1038/nature10887 711 (January 6, 2017). 712
Hacisuleyman, Ezgi et al. 2016. “Function and Evolution of Local Repeats in the Firre Locus.” Nature 713 Communications 7: 11021. http://www.nature.com/doifinder/10.1038/ncomms11021 (November 20, 714 2016). 715
Harrow, J. et al. 2012. “GENCODE: The Reference Human Genome Annotation for The ENCODE 716 Project.” Genome Research 22(9): 1760–74. http://genome.cshlp.org/cgi/doi/10.1101/gr.135350.111 717 (November 20, 2016). 718
Hezroni, Hadas et al. 2015. “Principles of Long Noncoding RNA Evolution Derived from Direct 719 Comparison of Transcriptomes in 17 Species.” Cell Reports 11(7): 1110–22. 720 http://linkinghub.elsevier.com/retrieve/pii/S2211124715004106 (September 1, 2017). 721
———. 2017. “A Subset of Conserved Mammalian Long Non-Coding RNAs Are Fossils of Ancestral 722 Protein-Coding Genes.” Genome Biology 18(1): 162. 723 http://www.ncbi.nlm.nih.gov/pubmed/28854954 (September 2, 2017). 724
Hindorff, Lucia A et al. 2009. “Potential Etiologic and Functional Implications of Genome-Wide 725 Association Loci for Human Diseases and Traits.” Proceedings of the National Academy of Sciences 726 of the United States of America 106(23): 9362–67. http://www.ncbi.nlm.nih.gov/pubmed/19474294 727 (August 30, 2017). 728
Holdt, Lesca M et al. 2013. “Alu Elements in ANRIL Non-Coding RNA at Chromosome 9p21 Modulate 729 Atherogenic Cell Functions through Trans-Regulation of Gene Networks.” PLoS Genet 9(7). 730 http://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1003588&type=printable 731 (August 25, 2017). 732
Hu, Boqin et al. 2017. “POSTAR: A Platform for Exploring Post-Transcriptional Regulation Coordinated 733 by RNA-Binding Proteins.” Nucleic Acids Research 45(D1): D104–14. 734 https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkw888 (September 2, 2017). 735
Huda, Ahsan, Nathan J. Bowen, Andrew B. Conley, and I. King Jordan. 2011. “Epigenetic Regulation of 736 Transposable Element Derived Human Gene Promoters.” Gene 475(1): 39–48. 737
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 24: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/24.jpg)
24
http://www.sciencedirect.com/science/article/pii/S0378111910004762 (August 27, 2017). 738
Jjingo, Daudi et al. 2014. “Mammalian-Wide Interspersed Repeat (MIR)-Derived Enhancers and the 739 Regulation of Human Gene Expression.” Mobile DNA 5(1): 14. 740 http://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-14 (September 2, 2017). 741
Johnson, R. et al. 2006. “Identification of the REST Regulon Reveals Extensive Transposable Element-742 Mediated Binding Site Duplication.” Nucleic Acids Research 34(14): 3862–77. 743 http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkl525 (August 25, 2017). 744
Johnson, Rory, and Roderic Guigó. 2014. “The RIDL Hypothesis: Transposable Elements as Functional 745 Domains of Long Noncoding RNAs.” RNA (New York, N.Y.) 20(7): 959–76. 746 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4114693&tool=pmcentrez&rendertype=747 abstract (April 29, 2015). 748
Jurka, J, E Zietkiewicz, and D Labuda. 1995. “Ubiquitous Mammalian-Wide Interspersed Repeats (MIRs) 749 Are Molecular Fossils from the Mesozoic Era.” Nucleic acids research 23(1): 170–75. 750 http://www.ncbi.nlm.nih.gov/pubmed/7870583 (August 29, 2017). 751
Kapusta, Aurélie et al. 2013. “Transposable Elements Are Major Contributors to the Origin, 752 Diversification, and Regulation of Vertebrate Long Noncoding RNAs.” ed. Hopi E. Hoekstra. PLoS 753 genetics 9(4): e1003470. http://dx.plos.org/10.1371/journal.pgen.1003470 (August 25, 2017). 754
Kelley, David R, David G Hendrickson, Danielle Tenen, and John L Rinn. 2014. “Transposable Elements 755 Modulate Human RNA Abundance and Splicing via Specific RNA-Protein Interactions.” Genome 756 Biology 15(12): 537. http://www.ncbi.nlm.nih.gov/pubmed/25572935 (September 2, 2017). 757
Kelley, David, and John Rinn. 2012. “Transposable Elements Reveal a Stem Cell-Specific Class of Long 758 Noncoding RNAs.” Genome Biology 13(11): R107. 759 http://genomebiology.biomedcentral.com/articles/10.1186/gb-2012-13-11-r107 (August 25, 2017). 760
———. “Transposable Elements Reveal a Stem Cell-Specific Class of Long Noncoding RNAs.” 761 https://genomebiology.biomedcentral.com/track/pdf/10.1186/gb-2012-13-11-762 r107?site=genomebiology.biomedcentral.com (August 25, 2017). 763
Konkel, Miriam K, Jerilyn A Walker, and Mark A Batzer. 2010. “LINEs and SINEs of Primate 764 Evolution.” Evolutionary anthropology 19(6): 236–49. 765 http://www.ncbi.nlm.nih.gov/pubmed/25147443 (August 29, 2017). 766
Lev-Maor, Galit, Rotem Sorek, Noam Shomron, and Gil Ast. 2003. “The Birth of an Alternatively 767 Spliced Exon: 3’ Splice-Site Selection in Alu Exons.” Science 300(5623). 768 http://science.sciencemag.org/content/300/5623/1288/tab-pdf (August 25, 2017). 769
Li, Jun-Hao et al. 2014. “starBase v2.0: Decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA 770 Interaction Networks from Large-Scale CLIP-Seq Data.” Nucleic Acids Research 42(D1): D92–97. 771 https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkt1248 (September 2, 2017). 772
Martin, Kelsey C, and Anne Ephrussi. 2009. “mRNA Localization: Gene Expression in the Spatial 773 Dimension.” Cell 136(4): 719–30. http://www.ncbi.nlm.nih.gov/pubmed/19239891 (August 29, 774 2017). 775
Mas-Ponte, David et al. 2017a. “LncATLAS Database for Subcellular Localisation of Long Noncoding 776 RNAs.” RNA (New York, N.Y.): rna.060814.117. 777 http://rnajournal.cshlp.org/lookup/doi/10.1261/rna.060814.117 (April 10, 2017). 778
———. 2017b. “LncATLAS Database for Subcellular Localization of Long Noncoding RNAs.” RNA 779
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 25: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/25.jpg)
25
23(7): 1080–87. http://rnajournal.cshlp.org/lookup/doi/10.1261/rna.060814.117 (July 19, 2017). 780
Mercer, Tim R, and John S Mattick. 2013. “Structure and Function of Long Noncoding RNAs in 781 Epigenetic Regulation.” Nature Publishing Group 20. 782 https://www.nature.com/nsmb/journal/v20/n3/pdf/nsmb.2480.pdf (August 25, 2017). 783
Mukherjee, Neelanjan et al. 2017. “Integrative Classification of Human Coding and Noncoding Genes 784 through RNA Metabolism Profiles.” Nature structural & molecular biology 24(1): 86–96. 785 http://www.nature.com/doifinder/10.1038/nsmb.3325 (September 2, 2017). 786
Necsulea, Anamaria et al. 2014. “The Evolution of lncRNA Repertoires and Expression Patterns in 787 Tetrapods.” Nature 505(7485): 635–40. http://www.ncbi.nlm.nih.gov/pubmed/24463510 (May 11, 788 2017). 789
Ning, Shangwei et al. 2016. “Lnc2Cancer: A Manually Curated Database of Experimentally Supported 790 lncRNAs Associated with Various Human Cancers.” Nucleic Acids Research 44(D1): D980–85. 791 https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkv1094 (August 30, 2017). 792
Nissan, Aviram et al. 2012. “Colon Cancer Associated Transcript-1: A Novel RNA Expressed in 793 Malignant and Pre-Malignant Human Tissues.” International Journal of Cancer 130(7): 1598–1606. 794 http://www.ncbi.nlm.nih.gov/pubmed/21547902 (August 29, 2017). 795
Van Nostrand, Eric L et al. 2016. “Robust Transcriptome-Wide Discovery of RNA-Binding Protein 796 Binding Sites with Enhanced CLIP (eCLIP).” Nature Methods 13(6): 508–14. 797 http://www.ncbi.nlm.nih.gov/pubmed/27018577 (September 2, 2017). 798
Perepelitsa-Belancio, Victoria, and Prescott Deininger. 2003. “RNA Truncation by Premature 799 Polyadenylation Attenuates Human Mobile Element Activity.” Nature Genetics 35(4): 363–66. 800 http://www.nature.com/doifinder/10.1038/ng1269 (August 25, 2017). 801
Quek, Xiu Cheng et al. 2015. “lncRNAdb v2.0: Expanding the Reference Database for Functional Long 802 Noncoding RNAs.” Nucleic acids research 43(Database issue): D168-73. 803 https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gku988 (December 24, 2016). 804
Quinlan, Aaron R, and Ira M Hall. 2010. “BEDTools: A Flexible Suite of Utilities for Comparing 805 Genomic Features.” Bioinformatics (Oxford, England) 26(6): 841–42. 806 http://www.ncbi.nlm.nih.gov/pubmed/20110278 (August 25, 2017). 807
Roberts, Justin T, Sara E Cardin, and Glen M Borchert. 2014. “Burgeoning Evidence Indicates That 808 microRNAs Were Initially Formed from Transposable Element Sequences.” Mobile Genetic 809 Elements 4(3): e29255. http://www.tandfonline.com/doi/abs/10.4161/mge.29255 (August 25, 2017). 810
Schmitt, Adam M. et al. 2016. “Long Noncoding RNAs in Cancer Pathways.” Cancer Cell 29(4): 452–811 63. http://linkinghub.elsevier.com/retrieve/pii/S1535610816300927 (June 28, 2016). 812
Seemann, Stefan E. et al. 2017. “The Identification and Functional Annotation of RNA Structures 813 Conserved in Vertebrates.” Genome Research 27(8): 1371–83. 814 http://www.ncbi.nlm.nih.gov/pubmed/28487280 (September 2, 2017). 815
Sela, Noa et al. 2007. “Comparative Analysis of Transposed Element Insertion within Human and Mouse 816 Genomes Reveals Alu’s Unique Role in Shaping the Human Transcriptome.” 8(6). 817 https://genomebiology.biomedcentral.com/track/pdf/10.1186/gb-2007-8-6-818 r127?site=genomebiology.biomedcentral.com (August 25, 2017). 819
Siepel, Adam et al. 2005. “Evolutionarily Conserved Elements in Vertebrate, Insect, Worm, and Yeast 820 Genomes.” Genome research 15(8): 1034–50. http://www.ncbi.nlm.nih.gov/pubmed/16024819 821
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 26: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/26.jpg)
26
(August 25, 2017). 822
Smith, M. A., T. Gesell, P. F. Stadler, and J. S. Mattick. 2013. “Widespread Purifying Selection on RNA 823 Structure in Mammals.” Nucleic Acids Research 41(17): 8220–36. 824 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3783177&tool=pmcentrez&rendertype=825 abstract (April 20, 2015). 826
Suzuki, Keiko et al. 2010. “REAP: A Two Minute Cell Fractionation Method.” BMC Research Notes 827 3(1): 294. http://www.ncbi.nlm.nih.gov/pubmed/21067583 (August 28, 2017). 828
Ulitsky, Igor, and David P. Bartel. 2013. “lincRNAs: Genomics, Evolution, and Mechanisms.” Cell 829 154(1): 26–46. http://linkinghub.elsevier.com/retrieve/pii/S0092867413007599 (November 20, 830 2016). 831
Washietl, Stefan, Manolis Kellis, and Manuel Garber. 2014. “Evolutionary Dynamics and Tissue 832 Specificity of Human Long Noncoding RNAs in Six Mammals.” Genome research 24(4): 616–28. 833 http://www.ncbi.nlm.nih.gov/pubmed/24429298 (May 25, 2014). 834
Welter, Danielle et al. 2014. “The NHGRI GWAS Catalog, a Curated Resource of SNP-Trait 835 Associations.” Nucleic Acids Research 42(D1): D1001–6. https://academic.oup.com/nar/article-836 lookup/doi/10.1093/nar/gkt1229 (August 30, 2017). 837
Zhang, B. et al. 2014. “A Novel RNA Motif Mediates the Strict Nuclear Localization of a Long 838 Noncoding RNA.” Molecular and Cellular Biology 34(12): 2318–29. 839 http://mcb.asm.org/cgi/doi/10.1128/MCB.01673-13 (November 20, 2016). 840
Zhang, Kun et al. 2014. “The Ways of Action of Long Non-Coding RNAs in Cytoplasm and Nucleus.” 841 Gene 547(1): 1–9. http://www.ncbi.nlm.nih.gov/pubmed/24967943 (March 17, 2015). 842
843
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 27: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/27.jpg)
RIDLs
Passenger TEsLncRNA
RNAProteinDNA
Exonic Enrichment
Strand Bias
EvolutionaryConservation
A
B
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 28: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/28.jpg)
262
27
402
37
219
634
315
185
47
209
126
68
117
25
235
28
196
20
175
415
167
333
10
153
10
52
119
28
15
10
116
5
122
358
135
77
1
53
0
58
52
11
367
44
116
25
98
214
243
138
13
372
21
59
136
18
913
36
507
56
295
790
710
288
43
378
118
117
249
37
1795
172
336
21
307
790
1292
1161
347
1905
1846
203
628
125
691
32
250
11
137
227
525
327
31
309
131
43
89
24
1017
58
67
10
95
217
324
174
5
629
21
69
124
36
625
10
75
4
62
150
335
111
0
151
0
15
15
14
304
15
112
4
70
83
176
234
0
332
44
28
66
15
454
50
133
9
112
194
763
218
7
278
116
83
173
41
1903
177
272
13
316
551
1276
1127
293
1749
1620
190
630
112hAT−Tip100
hAT−Charlie
TcMar−Tigger
Simple_repeat
MIR
Low_complexity
L2
L1
ERVL−MaLR
ERVL
ERVK
ERV1
CR1
Alu
TSS+
TSS−
Splic
eAcc
+
Splic
eAcc
−
Splic
eDon
+
Splic
eDon
−
Enco
mpa
ss+
Enco
mpa
ss−
TTS+
TTS−
Insi
de+
Insi
de−
Rep
eat F
amily
0
500
1000
1500
0
5000
10000
15000
20000
TSS
Accep
tor
Encom
pass
Donor
TTSIns
ide
Num
ber
0.0
0.1
0.2
0.3
0.4
lncrna
.exon
s
lncrna
.intro
ns
geno
me
Frac
tion
Ove
rlapp
ed
DNALINELow complexityLTRSatelliteSimple repeatSINE
A B
C D
E
TSSDonorAcceptorInsideEncompassingTTS
Transcripts
MergedGene
RepeatMasker
Exon
ic T
E An
nota
tion
26414 lncRNA transcripts Gencode 21
48684 merged exons
31004 introns
5520018RepeatMasker
46474 exonic overlap
562640 intronic overlap
23975 samesense22499 antisense
273264 samesense289376 antisense
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 29: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/29.jpg)
Strand Bias
� �
�
�
��
�
��
�
��
�
� �
��
�
�
�
��
��
�
��
��
�
�
�
�
�
�
�
�
� �
�
��
�
�
�
�
�
�
��
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
��
���
��
�
�
�
��
�
�
�
�
�
�
�
��
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
L1MA9L1MC2L1PA15L1PA16
L1PA3
L1PA5
L1PB1
LTR78
MIRc
MLT1B
AluSx1
AluYc
Charlie2b
L1M4b
L1M5
L1MA2L1MA6
L1MB4L1MC1L1MCa
L1ME2
L1ME3D
L1PA4L1PA7
Tigger2
−4
−2
0
2
4
3.5
4.0
4.5
5.0
5.5
Total exonic overlap (Log10 nt)
Log2
ratio
sen
se /
antis
ense
ove
rlapExonic Enrichment
A B
C
D
�
�
�
�
�
�
��
��
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
��
�
� �
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
� �
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
� �
(A)n
(AC)n
(AT)n
(CA)n(CT)n
(CTC)n
(GCC)n(GGC)n
(GT)n(T)n
(TA)n
(TC)n(TG)n
(TTTC)n
A−rich
ALR/Alpha
AluAluJr4
AluSc
AluSc5
AluSc8AluSg
AluSg4
AluSg7AluSpAluSq
AluSq2AluSx3
AluSx4 AluSz6
AluYcAluYh3 AluYm1
AmnSINE1
Arthur1
Arthur1B
BSR/Beta
Charlie1
Charlie15a
Charlie16a
Charlie18aCharlie1aCharlie1b
Charlie2b
Charlie4aCharlie4z
Charlie5
Charlie7
Charlie8
ERV3−16A3_I−int
ERVL−B4−intERVL−E−int
FAMFLAM_A
FLAM_C
FRAM
G−rich
GA−rich
HAL1HAL1ME
HAL1b
HERV16−int
HERV3−int
HERVE−int
HERVH−int
HERVK9−int
L1M1
L1M2
L1M3L1M4
L1M4a1
L1M4b
L1M4c
L1M5
L1M6
L1M7
L1MA1
L1MA10L1MA2
L1MA3L1MA4
L1MA5
L1MA5A
L1MA6L1MA7
L1MA8
L1MA9
L1MB2
L1MB3
L1MB4L1MB5
L1MB7
L1MB8L1MC
L1MC1
L1MC2
L1MC3
L1MC4
L1MC4a
L1MC5
L1MC5a
L1MCa
L1MCcL1MD
L1MD1
L1MD2
L1MD3
L1MDa
L1ME1
L1ME2
L1ME2z L1ME3L1ME3A
L1ME3BL1ME3Cz
L1ME3D
L1ME3E
L1ME3G
L1ME4aL1ME4b
L1ME4c
L1ME5
L1MEc
L1MEd
L1MEf
L1MEgL1MEhL1MEi
L1P2
L1P3
L1PA10
L1PA11
L1PA13
L1PA14 L1PA15
L1PA16
L1PA17
L1PA2 L1PA3
L1PA4
L1PA5
L1PA7
L1PA8
L1PREC2
L2
L3
L3b
L4_A_MamL4_B_Mam
L4_C_Mam
LFSINE_Vert LOR1−int
LTR104_Mam
LTR12
LTR12C
LTR12D
LTR12F
LTR16
LTR16ALTR16A1
LTR16A2
LTR16C
LTR16E1
LTR16E2
LTR1A2
LTR33
LTR33B
LTR49−int
LTR50
LTR67B
LTR7 LTR78
LTR78BLTR79
LTR7C
LTR8
LTR85c
LTR88bMADE2
MARNA
MER102a
MER102b
MER102c
MER103C
MER104
MER112
MER113
MER115
MER117 MER1A
MER1BMER2
MER20MER20B
MER21A
MER21B
MER21CMER3
MER30
MER33
MER39MER4−int
MER41−int
MER41A
MER41BMER44A
MER45A
MER46C
MER49
MER4A1
MER4D1 MER50
MER51A
MER51E
MER52A
MER53MER57−int
MER57A−int
MER58A
MER58B
MER58C
MER5A
MER5A1
MER5BMER5C
MER61−int
MER61A
MER63A
MER63B
MER77
MER8
MER81
MER91A
MER92−int
MER94
MER96B
MIR1_Amn
MLT1A
MLT1A0MLT1A1
MLT1B
MLT1C
MLT1D
MLT1E1A
MLT1E2
MLT1E3
MLT1F
MLT1F1MLT1F2
MLT1G
MLT1G1MLT1G3
MLT1H
MLT1H1MLT1H2
MLT1I
MLT1JMLT1J1
MLT1J2
MLT1K
MLT1L
MLT1M
MLT1N2
MLT1O
MLT2A1
MLT2A2
MLT2B1
MLT2B2
MLT2B3
MLT2B4
MLT2D
MLT2F
MSTA
MSTBMSTB1
MSTC
MSTD
MamGypsy2−I
MamRTE1MamRep137
MamRep4096
MamRep434
MamRep605
MamSINE1MamTip2
OldhAT1
Plat_L3
REP522
SST1
SVA_D
THE1A
THE1B
THE1C
THE1D
Tigger1
Tigger13a
Tigger15aTigger19aTigger19b
Tigger2
Tigger2a
Tigger3a
Tigger3b
Tigger4b Tigger7
−2
0
2
3.0 3.5 4.0 4.5 5.0
Total Exonic Coverage (Log10 nt)
Log2
Rat
io E
xoni
c vs
Intro
nic
Cov
erag
e
�a�a�a
EnrichedNot significantDepleted
Placental Mammal PhastCons
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
��
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
��
��
��
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
��
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
(AT)n
(T)n
(TA)n
A−rich AluJrAluSc
AluSc8
AluSpGA−rich
L1MA4L1MA5A
L1MC1L1MC3
L1MD2
L1ME1L1ME3AL1MEg
L2b
LTR12C
MER63A
MIR
MLT1A0
MLT1H1
MLT1K
MLT2B4
THE1D
AluJb
AluSg
AluSq2AluSx1
AluSx3AluSz
L1MEc
MLT1CMLT1H
MER50
−4
−2
0
2
4
3.5
4.0
4.5
5.0
5.5
Total exonic overlap (Log10 nt)
Log2
ratio
exo
n / i
ntro
n ov
erla
p
Log2 ratio exon/intron overlap
Freq
uenc
y
−4 −2 0 2 4
010
2030
4050
60 MLT1A0
1000
Ran
dom
isat
ions
9415552926109054416
5374
201432324017
99
Individ
ual
Type
s
Total
(AAC
)n(A
ATA)
nAl
uJb
AluJ
oAl
uJr
AluS
cAl
uSc8
AluS
g4Al
uSp
AluS
qAl
uSq2
AluS
x1Al
uSz
(A)n
A−ric
h(A
T)n
BSR/
Beta
(CCG
)n(C
GC)
n(C
GG
)nCh
arlie
15a
(CT)
nER
VL−B
4−in
tFR
AMG
A−ric
h(G
CC)n
(GG
C)n
G−r
ichHE
RV3−
int
HERV
E−in
tHE
RVK9
−int
L1M
A4L1
MA5
L1M
A5A
L1M
A7L1
MA9
L1M
C1L1
MC2
L1M
C3L1
MC4
L1M
DL1
MD1
L1M
D2L1
MD3
L1M
E1L1
ME3
AL1
ME4
bL1
MEf
L1M
EgL1
PA15
L1PA
16L1
PA2
L1PA
3L1
PA5
L1PB
1L2
aL2
bL2
cLT
R12C
LTR1
2DLT
R12F
LTR1
6E1
LTR3
3LT
R78
LTR7
CM
ER1A
MER
41A
MER
49M
ER50
MER
51E
MER
57A−
int
MER
57−i
ntM
ER61
AM
ER61
−int
MER
63A
MER
92−i
ntM
IRM
IR3
MIR
bM
IRc
MLT
1A0
MLT
1A1
MLT
1BM
LT1C
MLT
1DM
LT1F
MLT
1H1
MLT
1J2
MLT
1KM
LT2B
4RE
P522
SST1
(TA)
nTH
E1D
Tigg
er3a (T)n
(TTT
C)n
(TTT
A)n
ALR/
Alph
a
2.55.07.510.0
Log2 Enrichment
Structure Conserved
Vertebrate Conserved
Placental Conserved
Primate Conserved
Strand Bias
Exonic Enrichment
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 30: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/30.jpg)
●
●
●
●
●
●●
0
20
40
60
4 5 6 7 8 9 10Number of RIDLs per gene
Cou
nt
shuffled valuestrue value95% confidence
RIDLs per gene
XISTTPT1-AS1LINC00963
TSIXmir-29cLINC00861
0 500 1000 1500
0.0
0.4
SST1 CC: −0.18 RIDLs: 41
Intronic TEs: 93
Position in Repeat Consensus (nt)Base
Incl
usio
n Fr
eque
ncy
candidate.RIDL.merged.oldIDs.hg38
A
B C D
E F G
0
500
1000
1500
2000
Aluce
ntrERV1
ERVKERVL
ERVL−MaL
R
hAT−B
lackja
ck
hAT−C
harlie L1 L2
Low_c
omple
xityMIR
Satellite
Simple
_repe
at
TcMar−
Tiggertel
o
#RID
L
EvidenceConservationConservation + EnrichmentConservation + Same StrandEnrichmentEnrichment + Same StrandSame Strand
48
13
0.0
0.5
1.0
1.5
2.0
2.5
% o
f gen
es
RIDL-lncRNAs (n=3566)Other lncRNAs (n=2068)
Functionally-characterised lncRNAs
P = 0.01172
75
0
2
4
6
% o
f gen
es
RIDL-lncRNAs (n=3566)Other lncRNAs (n=2068)
Disease-associated lncRNAs
P = 0.03
200
69131
55
0.0
2.5
5.0
7.5
10.0
% o
f gen
es
RIDL-lncRNAs (n=3566)Other lncRNAs (n=2068)
aa
PromotersExons
GWAS SNP overlap
P = 0.0001
P = 0.04
Scalechr8
Merged Exons
RIDL
RIDL
7 Vert. El
20-way El
5 kbhg38127,209,000127,210,000127,211,000127,212,000127,213,000127,214,000127,215,000127,216,000127,217,000127,218,000127,219,000
Basic Gene Annotation Set from GENCODE Version 22 (Ensembl 79)
Exonic TE Annotation
RIDLs Annotation
Multiz Alignment & Conservation (7 Species)
Primates Multiz Alignment & Conservation (20 Species)
CCAT1
SINE-MIRDNA-hAT-CharlieSINE-MIRSimple repeatSINE-AluSINE-MIR
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 31: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/31.jpg)
TE T
ype
Nuc
lC
yto
Log2
Nuc
l/Cyt
o R
atio
Number of RIDLs
NuclCyto
Cell Type
Nuc
lC
yto
Nucl/CytoRatio
7
21
7
11
79
3
8
6
67
55
53
4
6
8
86
52
112
91
10
16
7
20
8
6
72
3
10
6
42
46
46
5
5
8
61
55
79
73
12
18
7
25
10
9
97
3
11
54
48
52
3
6
6
81
62
101
94
13
13
6
18
11
9
91
8
57
45
55
5
9
6
89
53
95
90
10
14
8
22
7
10
80
3
7
3
57
48
52
4
6
7
76
62
99
86
8
16
6
18
10
12
76
3
8
53
52
38
3
5
7
67
59
95
91
12
10
5
17
10
12
78
11
45
44
42
3
4
6
63
53
77
84
12
13
7
18
7
6
71
9
4
50
46
42
3
6
8
71
48
85
75
6
19
8
29
11
12
103
3
13
69
60
66
4
6
7
87
63
118
106
12
21
6
28
12
10
99
3
13
5
59
60
62
5
5
6
80
73
114
108
11
28MLT1J2MLT1A0
MIRcMIRbMIR3MIR
MER49LTR7C
LTR12DL2cL2bL2a
L1PA16L1ME3A
L1MD3GC_richGA.richCT.richAT_rich
AluSx
A549
GM1287
8
HeLaS
3
HepG2
HUVECIM
R90K56
2MCF7
NHEK
SKNSH
−1012 Log2 Nucl/Cyto Ratio
Cum
ulat
ive
Freq
uenc
y
L2b in IMR90
A
B C
D IMR90
0.00
0.25
0.50
0.75
1.00
−6 − 0 6
44 L2b2128 not
3 3
P=6.7e-6
695266
353
1
0.751.89
3.095.12
6.57
−5
0
5
10
0 1 2 3 4
Rho=0.2P = 0
RIDL x lncRNAs N C
All other lncRNAs
vs
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint
![Page 32: 1 Ancient exapted transposable elements drive nuclear ...64 recruits ANRIL to target genes (Holdt et al. 2013); a SINEB2 repeat in Uchl1AS increases translational rate 65 of its sense](https://reader035.vdocuments.us/reader035/viewer/2022081410/6092a6ff6387f06dea49c74e/html5/thumbnails/32.jpg)
Randomised
Wild Type
Mutant
RIDL N C
Transfect Fractionate
1 kb hg38116,533,500 116,534,000 116,534,500 116,535,000 116,535,500116,533,000
LINC00173MIRb > MIRc >
200 baseshg3858,817,10058,817,20058,817,30058,817,40058,817,50058,817,60058,817,70058,817,800
RP4-806M20.4MIR296 L2b >
2 kb hg38911,000 911,500 912,000 912,500 913,000 913,500 914,000 914,500 915,000 915,500
MIRb >L2b <
RP11-54O7.1L2b >
Chr20
Chr1
Chr12
3147-3372
137-2193096-3275
3294-3371
31-184 57-206
Position in consensus
A
B C
D
FE
Expr
essio
n (F
PKM
)
HeLa R
NA
seq
0
2
4
6 Whole cellCytoplasmNucleus
0.0
2.5
5.0
7.5
MALAT1GAPDH
Log2
Nuc
l/Cyt
o Ra
tio HeLa qR
T-PCR
RP11-54O7.1
LINC00173
RP4-806M20.4
WT endogenous
RP11-54O7.1
LINC00173
RP4-806M20.4
Nuclear controlCytoplasmic contol
Log2
Nuc
l/Cyt
o Ra
tio
MALAT1 / GAPDH
RP11-54O7.1
LINC00173
RP4-806M20.4
−2
0
2
4
WT transfected
P = 0.002 P = 0.01 P = 0.006
HeLa
GAPDH
H3
Nuclear controlMutant (randomised)
Cytoplasmic contol
N=4
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted September 16, 2017. ; https://doi.org/10.1101/189753doi: bioRxiv preprint