distinct expression patterns of natural antisense ... · henz et al.: expression of natural...
Post on 07-Jun-2020
13 Views
Preview:
TRANSCRIPT
Distinct expression patterns of natural antisense
transcripts in Arabidopsis
Stefan R. Henz, Jason S. Cumbie, Kristin D. Kasschau, Jan U. Lohmann, James C.
Carrington, Detlef Weigel, Markus Schmid*
Max Planck Institute for Developmental Biology, Department of Molecular Biology,
Spemannstrasse 37-39, D-72076 Tübingen, Germany (S.R.H., J.U.L., D.W., M.S.); and
Center for Genome Research and Biocomputing and Department of Botany and Plant
Pathology, Oregon State University, Corvallis, Oregon 97331, USA (J.S.C., K.D.K., J.C.C.)
*To whom correspondence should be addressed: markus.schmid@tuebingen.mpg.de
RUNNING TITLE: Antisense transcripts in Arabidopsis thaliana
Key words: Arabidopsis, natural antisense transcript, cis-NATs, microRNA, small RNA
Mailing address of corresponding author:
Markus Schmid
MPI for Developmental Biology
Spemannstrasse 37-39/VI
D-72076 Tübingen
GERMANY
Phone: +49 7071-601-1416
Fax: +49 7071-601-1412
Email: Markus.Schmid@tuebingen.mpg.de
Plant Physiology Preview. Published on May 11, 2007, as DOI:10.1104/pp.107.100396
Copyright 2007 by the American Society of Plant Biologists
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
2
Abstract
It has been shown that overlapping cis-natural antisense transcripts (cis-NATs) can
form a regulatory circuit, in which small RNAs derived from one transcript regulate
stability of the other transcript, which manifests itself as anti-correlated expression.
However, little is known about how widespread antagonistic expression of cis-NATs
is. We have determined how frequently cis-NAT pairs, which make up 7.4% of
annotated transcription units in the Arabidopsis thaliana genome, show anti-
correlated expression patterns. Indeed, global expression profiles of pairs of cis-
NATs on average have significantly lower pairwise Pearson correlation coefficients
(PCC) than other pairs of neighboring genes whose transcripts do not overlap.
However, anti-correlated expression that is greater than expected by chance is only
found in a small number of cis-NAT pairs. The degree of anti-correlation does not
depend on the length of the overlap or on the distance of the 5’ ends of the
transcripts. Consistent with earlier findings, cis-NATs do not exhibit an increased
likelihood to give rise to small RNAs, as determined from available small RNA
sequences and MPSS tags. However, the overlapping regions of cis-NATs appeared
to be enriched for small RNA loci compared to non-overlapping regions.
Furthermore, expression of cis-NATs was not disproportionately affected in various
RNA silencing mutants. Our results demonstrate that there is a trend towards anti-
correlated expression of cis-NAT pairs in Arabidopsis, but currently available data do
not produce a strong signature of small RNA mediated silencing for this process.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
3
Introduction
Much of gene expression is primarily regulated at the level of transcription. Over the
last few years, however, it has become increasingly apparent that post-transcriptional
regulation at the RNA level is more widespread and important than previously
assumed (Behm-Ansmant and Izaurralde, 2006; Brodersen and Voinnet, 2006;
Newbury, 2006). While various types of regulatory RNA molecules have been shown
to exist, arguably the most prominent ones are micro-RNAs (miRNAs) (Bartel, 2004;
Jones-Rhoades et al., 2006; Vazquez, 2006). MiRNAs are derived from larger
transcripts generated by RNA polymerase II, and found in both animals and plants.
The primary transcript is processed to give rise to a short 20 to 24 nucleotide long
RNA molecule, the miRNA, which by annealing to partially complementary sites of
mRNAs, can lead to either cleavage of the mRNA or translational inhibition. Another
type of small RNAs that regulate the stability of transcripts are short interfering RNAs
(siRNAs). In contrast to miRNAs, siRNAs are always perfectly complementary to
their targets. One source of siRNAs are double-stranded RNAs generated by
transcription of a locus in both the sense and antisense orientation (Kumar and
Carmichael, 1998; Vanhee-Brossollet and Vaquero, 1998). Such antisense
transcripts were first observed in transgenic experiments, but natural antisense
transcripts (NATs) also occur. There are two classes of NATs: cis-NATs, which are
formed by antisense transcription at the same genomic locus, and trans-NATs,
where sense and antisense transcripts are derived from different loci.
Large-scale genome projects have revealed the common occurrence of
overlapping gene pairs in most species analyzed (Lehner et al., 2002; Shendure and
Church, 2002; Osato et al., 2003; Yelin et al., 2003; Wang et al., 2005). The reported
frequencies for overlapping gene pairs found in different species vary, depending on
sample size and other search parameters, but usually range between 5% to 10% of
all neighboring gene pairs. In the human genome, 4% to 9% of all transcript pairs
overlap, while in the murine genome 1.7% to 14% have been identified as
overlapping. A particularly extreme case is Drosophila, where up to 22% of all
neighboring genes have been reported to overlap. Across the various species, the
majority of overlapping gene pairs is transcribed in convergent orientation, thus
representing true cis-NAT pairs.
NATs have been implicated in such diverse processes as transcription
occlusion, RNA interference, alternative splicing, RNA editing, DNA methylation and
genomic imprinting (Farrell and Lukens, 1995; Sureau et al., 1997; Billy et al., 2001;
Tufarelli et al., 2003; Kim et al., 2004; Jen et al., 2005; Wang et al., 2005). In the
plant kingdom, cis-NATs have been analyzed in rice and in Arabidopsis (Osato et al.,
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
4
2003; Yamada et al., 2003; Jen et al., 2005; Wang et al., 2005). Analysis of the
Arabidopsis transcriptome by means of whole genome tiling arrays has revealed
antisense expression of 7,600 transcripts, corresponding to roughly 25% of all
annotated genes (Yamada et al., 2003). A few additional studies have addressed the
question of antisense gene pairs in Arabidopsis in detail. Wang and colleagues
(Wang et al., 2005) identified 1,340 potential cis-NAT pairs in Arabidopsis and
confirmed expression of sense and antisense transcripts of 957 cis-NAT pairs using
sequence information of Arabidopsis full-length cDNAs and Massively Parallel
Signature Sequencing (MPSS) data (Meyers et al., 2004; Meyers et al., 2004). Using
qualitative criteria, these authors concluded that the majority of cis-NATs showed
highly anti-correlated expression. In an independent study, Jen and colleagues (Jen
et al., 2005) reported the existence of 1,083 transcript pairs that overlapped in
antisense orientation. They further uncovered a possible role of convergent
overlapping gene pairs in alternative splicing and polyadenylation, but did not find
any evidence for anti-correlated expression greater than expected by chance, which
is in disagreement with the findings of Wang and colleagues (Wang et al., 2005).
Finally, in an elegant set of experiments, SRO5 and P5CDH, a pair of cis-NATs,
were shown to have antagonistic functions in the regulation of salt tolerance in
Arabidopsis (Borsani et al., 2005). In response to salt stress, SRO5 mRNA is
induced, and a 24 nucleotide long siRNA is formed from the region of overlap with
P5CDH, dependent on components that are also involved in the generation of
siRNAs from transgene-derived dsRNA, such as DICER-LIKE 2 (DCL2) and RNA-
DEPENDENT RNA POLYMERASE 6 (RDR6). Subsequently, 21 nucleotide siRNAs
are formed by DCL1-dependent processing of P5CDH transcripts. Finally, 1,320
putative trans-NATs have been recently identified in the Arabidopsis genome (Wang
et al., 2006). Interestingly, a large number of transcripts was predicted to have both
trans- and cis-NATs, suggesting that antisense transcripts can form a complex
regulatory network.
Making use of large collections of microarray data, we have analyzed the
extent to which cis-NATs in Arabidopsis show anti-correlated expression, as reported
under salt stress for the SRO5 and P5CDH paradigm. We find that cis-NATs on
average are significantly more anti-correlated than non-overlapping neighboring
genes, but clear global anti-correlated expression is restricted to a small subset of
cis-NAT pairs, solving conflicting results that had previously been published (Jen et
al., 2005; Wang et al., 2005). Available data sets do not point to small RNAs being
increased in cis-NATs nor is expression of cis-NATs typically affected by mutations in
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
5
genes necessary for the biogenesis of small RNAs, suggesting that cis-NATs do not
always enter the RNA silencing pathway.
Results and Discussion
Antisense transcript pairs in Arabidopsis thaliana
As a first step towards analyzing the transcriptional regulation of natural antisense
transcripts derived from the same or adjacent loci, we categorized the transcription
units of the Arabidopsis genome, as annotated by The Arabidopsis Information
Resource (TAIR), release 6 (Haas et al., 2005). The Arabidopsis genome contains
30,359 transcription units that can be grouped into 30,354 transcript-pairs (Table 1).
Transcript pairs were further broken down into four major categories, depending on
which strand neighboring transcripts were located on, and whether transcripts were
overlapping or not. The majority of transcript pairs were found to be non-overlapping,
with 15,926 pairs transcribed from the same strand (category 1) and 13,249 from
opposite strands (category 2). We found only 53 overlapping transcript pairs where
both transcripts originated from the same strand (category 3). In contrast, we
identified 2,243 overlapping transcripts originating from opposite strands forming
1,126 natural antisense transcript pairs (cis-NAT; category 4), equaling 3.7% of all
transcript pairs. The majority of the cis-NAT pairs were simple pairs; with only eight
triplets and a single quadruplet identified.
To investigate the expression profiles of the cis-NATs, we mapped the
TAIR6.0 transcription units onto the Affymetrix ATH1 microarray. We found that
21,021 (out of 30,359) transcripts were represented by the array. Of these, 16,014
were arranged in adjacent pairs, which correspond to about half of all transcript pairs
encoded by the Arabidopsis genome. There was no substantial difference between
adjacent non-overlapping transcripts transcribed from the same strand (8,258;
51.8%) or from opposite strands (7,022; 53.0%). In contrast, overlapping transcripts
derived from the same strand were slightly underrepresented (20; 37.7%), while cis-
NATs were slightly overrepresented (714; 63.4%). The latter make up 4.4% of all
transcript pairs mapped onto the ATH1 array. Because of the low number of
transcript pairs in category 3, these were dropped from further analysis. Mapping
information of the four different transcript pair categories onto the Arabidopsis
genome and the ATH1 array can be found in Supplementary Tables 1 and 2,
respectively.
One concern with cis-NAT predictions is that the transcript ends reported in
the TAIR annotation might not be necessarily correct (Haas et al., 2005). We
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
6
therefore manually inspected all 714 potential cis-NATs that are present on the
Affymetrix ATH1 array for support by cDNA and/or EST clones that include either
spliced introns or large (at least 100 codons long) open reading frames. We found
that of the 714 potential cis-NATs, only 515 (72.1%; 1,027 transcripts in total) are
currently supported by cloned mRNAs with an overlap of at least one base
(Supplementary Table 3). Subsequent analysis focused primarily on this set of cis-
NATs.
The number of cis-NATs identified is slightly higher than what had previously
been reported (Jen et al., 2005; Wang et al., 2005). The discrepancies are likely due
to the different methods used to map cis-NATs onto the genome and changes in
gene annotation introduced with the latest genome release. A limitation of this
analysis one has to keep in mind is that the current annotation of the Arabidopsis
genome may still lack the extreme 5’ and 3’ ends for many transcripts. As a
consequence, our analysis might underestimate the number of Arabidopsis cis-
NATs. Even so, the ATH1 array is a fair representation of the different transcript pair
categories in Arabidopsis, allowing us to use expression data sets based on this
array to examine the expression profiles of cis-NATs in detail.
An excess of negative correlation coefficients of cis-NAT expression
To examine if there is a general difference between the expression profiles of cis-
NATs and non-overlapping transcript pairs, we calculated the pairwise Pearson
correlation coefficient (PCC) for these transcript pairs from four publicly available
data sets generated by the AtGenExpress initiative. The first set comprised data from
234 arrays that capture expression of 78 different tissue samples assayed in
triplicate throughout development (Schmid et al., 2005). The original data set
included also pollen samples, but because many genes show either very high or very
low expression levels in this tissue type, and pairs are therefore more likely to be
perfectly correlated or anti-correlated by chance than in other samples, we omitted
the pollen samples for this analysis. The second set of 236 arrays, from duplicate
samples, reflects responses to hormones and related substances (mostly created by
RIKEN) (Kiba et al., 2005; Nakabayashi et al., 2005; Nemhauser et al., 2006). The
two final sets, of 136 arrays each, had been used to measure the response to
various abiotic stresses in shoots and roots, respectively, with duplicate samples
(Kilian et al., 2007). We analyzed the shoot and or root data separately, to minimize
effects of tissue-specific expression.
In all four data sets, the pairwise PCCs of cis-NATs are skewed towards
negative values (Fig. 1), when compared to non-overlapping transcript pairs located
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
7
on either the same or opposite strands. This shift in distribution was statistically
significant in all four data sets using a two-sided, two sample Welch t-test (Table 2)
regardless whether all cis-NATs supported by the TAIR annotation (714; Table 1,
category 4), or only the manually curated set (515; Table 1, category 4*) were used.
Similar results were obtained using pairwise Spearman’s rank correlation coefficients
(SCCs), which are less sensitive to outliers (Supplementary Table 4). Comparisons
of the PCC and SCC values by scatter plot analysis revealed a high degree of
similarity, with R2 values ranging between 0.71 an 0.83, indicating the robustness of
the anti-correlation we observed (Supplementary Fig. 1).
In contrast, distributions between non-overlapping transcript pairs located on
either the same or the opposite strand were not significantly different in any of the
data sets. Fig. 2 shows the expression profiles of the cis-NATs with the lowest PCCs
for the individual microarray experiments.
One limitation of the AtGenExpress data sets is that they lack cellular
resolution. We therefore analyzed microarray data Birnbaum and colleagues
obtained from various cell types and regions of the root after cell sorting (Birnbaum et
al., 2003). We found that the distribution of PCC and SCC values of cis-NATs was
skewed towards negative values when compared to non-overlapping transcripts
(Supplementary Fig. 2). As was the case for the AtGenExpress data sets, this shift
towards negative correlation was found to be statistically significant (Table 2 and
Supplementary Table 4), suggesting that the bias towards anti-correlation we
observed in the AtGEnExpress data set reflects true anti-correlation of cis-NATs
within the same cells, as would be expected for direct regulatory effects.
The fact that we found on average statistically significant lower PCCs for cis-
NATs suggests that expression of one of the transcripts in these pairs can influence
expression of the other. However, the PCCs for the majority of cis-NATs fell in the
same range as non-overlapping transcript pairs, suggesting strong mutual regulation
for only a subset of cis-NATs. Thus, anti-correlated expression is much less
widespread than previously suggested based on MPSS expression data from 14
cDNA libraries, in which for the majority of cis-NATs, coexpression in the same tissue
was rarely found (Wang et al., 2005).
It has experimentally been demonstrated that SRO5 and P5CDH, a pair of
cis-NATs, have antagonistic functions in the regulation of salt tolerance in
Arabidopsis (Borsani et al., 2005). We therefore examined the expression profiles of
these two genes in greater detail and found that global expression of P5CDH and
SRO5 is not highly anti-correlated (Supplementary Fig. 3). The strongest anti-
correlation was found in the hormone data set with a PCC of -0.546, while in the
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
8
development and the abiotic stress data sets derived from shoots, anti-correlation
was weaker with PCC values of -0.171 and -0.178, respectively. In roots, the
expression of the two genes actually is positively correlated (PCC = +0.208) across
the various stress treatments, suggesting that mutual regulation of these two genes
is restricted to specific conditions.
Anti-correlation across the different microarray data sets
We next analyzed whether always the same cis-NAT pairs displayed strong negative
anti-correlation in the various data sets. We found that across the different data sets,
the most strongly anti-correlated cis-NATs varied (Fig. 3), and that there was only
weak overall correlation between the individual experiments. The highest correlation
was found between the development and hormone data sets with R2=0.25. For the
remaining comparisons the R2 value ranged from 0.05 to 0.14. Of the 515 manually
curated cis-NAT pairs analyzed, only six showed an average PCC of less than -0.5 in
all four microarray experiments. Of these, only two had PCC values lower than -0.5
in every individual experiment (Supplementary Table 3). These findings are
consistent with the idea that gene expression is primarily regulated at the
transcriptional level by factors, such as tissue identity, hormone status or stress, and
that only under specific conditions clear anti-correlation is seen. This finding also
implies that the simple presence of an antisense transcript is not sufficient for the
negative cross regulation, suggesting that the effectiveness of posttranscriptional
RNA regulation by RNA interference greatly varies.
Anti-correlation of antisense transcripts is not predicted by extent of overlap
or promoter distance
One obvious parameter that might influence the degree of mutual regulation could be
the length of the overlapping region. We therefore analyzed whether the PCC for a
given cis-NAT pair was correlated with the length of the overlap, but found no
evidence for such a relationship (Fig. 4). We next determined whether the distance
between the 5’ ends of the transcripts of cis-NATs were indicative for the degree of
negative correlation found, with the idea that proximity of promoters could cause
positive correlation in expression. However, similar to the length of the overlap, the
distance of 5’ ends of cis-NAT pairs had no effects on their PCCs (data not shown),
indicating that varying promoter distance is unlikely to confound the conclusions
about transcript overlap and anti-correlated expression.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
9
cis-NAT transcripts and RNA silencing
One possible mechanism that might cause negative correlation of cis-NAT RNA
accumulation could be the formation of double-stranded RNAs from the overlapping
mRNA regions and subsequent processing to siRNAs by DICER-LIKE proteins. The
resulting siRNAs could in turn lead to the destruction of one of the transcripts by an
RNA interference-like mechanism, as demonstrated for P5CDH and SRO5 (Borsani
et al., 2005).
To analyze the contribution of siRNAs to anti-correlated expression of cis-
NATS, we examined the distribution of small RNA loci across the genome
(Gustafson et al., 2005; Lu et al., 2005). Specifically we asked whether MPSS tags or
small RNA sequences from several source tissues are enriched in the overlapping
regions of cis-NATs, as would be expected if double-stranded RNAs were the cause
for downregulation of one of the transcripts (Rajagopalan et al., 2006; Kasschau et
al., 2007). Analysis was carried out for all cis-NATs based on the TAIR 6 annotation
(1,136 gene-pairs), as well as for those cis-NATs that are present on the ATH1 array
before (714 gene pairs) and after manual curation (515 gene-pairs). Results are
summarized in Table 3. For detailed information on the mapping of unique small
RNA loci to the Arabidopsis transcriptome see supplementary information.
We found that over all 1,126 cis-NAT pairs predicted by the TAIR 6
annotation, small RNAs were not enriched in cis-NATs when compared to non-
overlapping neighboring genes pairs (Table 3, top half). For example, we observed
1.467 small RNA loci per kb genomic sequence in non-overlapping gene pairs, but
we found only 0.388 loci/kb in the cis-NATs. However, if small RNAs were present in
cis-NATs at all, they appeared to be enriched in the overlapping region of cis-NAT
pairs (1.126 loci/kb) when compared to the non-overlapping region (0.315 loci/kb).
Similar results were obtained when we restricted the analysis to those cis-NATs that
are present on the ATH1 arrays (714) and were confirmed by manual curation (515).
In all instances, no enrichment of small RNAs in cis-NATs was observed. If one takes
into account that not all gene-pairs in a given category do contain small RNA loci, the
outcome differs in that, small RNAs were found to be enriched in the overlapping
region of cis-NATs (4.949 loci/kb) compared to non-overlapping gene-pairs (2.523
loci/kb) by a factor of approx. 2 (Table 3, lower half). Together, these findings point to
the fact that siRNA mediated silencing does not play a major role in the global
regulation of cis-NAT expression, at least not under those conditions examined in
published small RNA sequencing projects (Gustafson et al., 2005; Lu et al., 2005;
Rajagopalan et al., 2006; Kasschau et al., 2007)
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
10
Further support for this notion came from analyzing microarray data of
mutants affected in the biogenesis of small RNAs (Allen et al., 2005). We found that
cis-NATs accounted for 2.1% to 5.7% of all transcripts that changed significantly
between wild type and the different mutants (Table 4). Given that cis-NATs
supported by mRNAs make up 4.5% of all probe sets present on the ATH1 array
(1,027 / 22,810 probe sets), this is approximately what one would expect by chance,
indicating that cis-NATs are not more likely to be regulated by small RNAs than non-
overlapping transcripts. Taken together, we could not find positive evidence for a
pervasive role of small RNAs in the regulation of antisense transcripts.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
11
Conclusion
Our results paint the most detailed picture of the global regulation of cis-NATS in
plants so far. While we could show that cis-NAT pairs tend to have more anti-
correlated expression patterns than non-overlapping neighboring transcripts, we
found that pronounced anti-correlation across many samples can only be found in a
small subset of cis-NATs. Along these lines we found that discrete cis-NAT pairs
show anti-correlated expression in different experiments, suggesting that
independent transcriptional regulation of both members of a pair have a strong
influence on cis-NAT expression. The negative correlation of cis-NATs was also
observed in a cell-type specific data set, indicating that cis-NATs affect each others’
expression in individual cells. The observation that small RNA loci, representing
mainly siRNAs, were underrepresented in cis-NATs along with the fact that mutations
in the RNA silencing machinery did not have a significant effect on cis-NAT
expression confirm this notion and complement previous suggestions that small
RNAs and RNA interference are important for only a subset of cis-NATs (Lu et al.,
2005).
However, there is at least one known example in which small RNAs derived
from cis-NATs have been shown to be important in mutually antagonistic expression,
namely the SRO5 and P5CDH pair of cis-NATs involved in Arabidopsis salt tolerance
(Borsani et al., 2005). When exposed to salt stress, SRO5 message is induced,
leading to formation of small RNAs and activation of an RNA silencing pathway that
ultimately leads to downregulation of the P5CDH transcript. As pointed out before, no
small RNA MPSS tag from wild-type tissue maps to the overlapping region of the two
transcripts, consistent with the inducible nature of this particular siRNA. Borsani and
colleagues (Borsani et al., 2005) have also suggested that microarrays are imperfect
for assessing mutually antagonistic effects, if 3’ products are largely stable. Indeed,
SRO5 and P5CDH are only weakly anti-correlated in our data sets and are not
significantly different from non-overlapping transcripts. Nevertheless, the significant
shift in correlation coefficients of cis-NATs towards negative values when compared
with non-overlapping transcripts indicates that coordinated expression of cis-NAT
can be detected by microarrays, even if the mechanism by which this is achieved is
still unclear. These strongly anti-correlated cis-NATs will be attractive targets for
further mechanistic studies.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
12
Material and Methods
Mapping of transcript pairs
The XML file containing the latest annotation (version 6) of Arabidopsis
pseudochromosomes was downloaded from the TAIR FTP server
(ftp://ftp.arabidopsis.org/home/tair/). Start and stop position of the transcription units
along with information on the strand that encodes a mRNA and the gene description
were extracted. We used Perl scripts to categorize pairs of adjacent transcripts,
depending on overlap and whether they were transcribed from the same strand. In a
first step we defined all antisense transcripts that overlapped for at least one base as
predicted by the TAIR 6 annotation as potential cis-NATs. In a second step, all
predicted cis-NATs were manually inspected and only those that were supported by
spliced cDNA and/or EST clones were analyzed further. Single exon genes and gene
models not supported by any mRNA were required to be clearly coding (≥100 codon
open reading frame) in order to be included in the final cis-NATs list.
Determining correlation coefficients
Mapping information of transcripts onto the Affymetrix ATH1 array was obtained from
TAIR as well. We only used those probe sets that mapped to a single transcription
unit. In those few cases where a transcription unit was represented by more than one
specific probe set, we retained for further analysis only one of the probe sets at
random. Pairwise Pearson correlation coefficients (PCC) and pairwise Spearman’s
rank correlation coefficients (SCC) were calculated using programs written in Java.
Histograms (bin size 0.1), ranking and comparisons of PCCs between individual
microarray data sets were created in Microsoft Excel.
Microarray analysis
All microarray data used are publicly available. Data for correlation analysis were
from the AtGenExpress initiative (available from TAIR). Microarray data of small RNA
biogenesis mutants (Allen et al., 2005) were obtained from NCBI-GEO (GSE2473).
Microarray data were normalized using gcRMA (Wu et al., 2004) implemented in
GeneSpring 7.1 (Agilent Technologies). Genes that were differentially expressed
between controls and mutants affected in the biogenesis of small RNAs were
identified using the ‘Rank Product’ (Breitling et al., 2004) package implemented in ‘R’
(http://www.R-project.org). Percentage false positives (pfp) were calculated based on
100 permutations. Only probe sets with a pfp<0.05 for a given comparison were
carried forward. In addition to a pfp<0.05 we required a minimum of 2-fold change in
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
13
expression estimate for a probe set to be considered to be robustly differentially
expressed.
Mapping MPSS tags and small RNA sequences to cis-NATs
All MPSS tags and small RNA sequences used are publicly available. MPSS tags
were downloaded from the Arabidopsis MPSS database (http://mpss.udel.edu/at/)
(Meyers et al., 2004; Lu et al., 2005). Small RNAs sequences (Col-0) from several
source tissues were described previously and are accessible at NCBI-GEO
[GSE5228 and GSE6682] or the ASRP Database (ASRP,
http://asrp.cgrb.oregonstate.edu/db/) (Gustafson et al., 2005; Rajagopalan et al.,
2006; Kasschau et al., 2007). All tags and sequences were blasted against the
Arabidopsis genome to identify positions of perfect matches. MPSS tags or small
RNA sequences mapping to a single locus were analyzed for position information in
relation to cis-NATs using PERL scripts. MPSS tags or small RNA sequences were
counted if any portion of the locus overlapped the region of interest.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
14
Acknowledgements
We are indebted to Blake Meyers for making the MPSS data of small RNAs available
as a database dump. The initial generation of AtGenExpress microarray data was
supported by the Deutsche Forschungsgemeinschaft through a grant to L. Nover, T.
Altmann and DW, and by the Max Planck Society. We acknowledge support for this
work from the Max Planck Society, and by grants from the National Science
Foundation (MCB-0618433) and the United States Department of Agriculture (2005-
35319-15280) to JCC. JUL is an EMBO Young Investigator, and DW is a director of
the Max Planck Institute.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
15
Literature cited
Allen E, Xie Z, Gustafson AM, Carrington JC (2005) microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121: 207-221
Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-297
Behm-Ansmant I, Izaurralde E (2006) Quality control of gene expression: a stepwise assembly pathway for the surveillance complex that triggers nonsense-mediated mRNA decay. Genes Dev 20: 391-398
Billy E, Brondani V, Zhang H, Muller U, Filipowicz W (2001) Specific interference with gene expression induced by long, double-stranded RNA in mouse embryonal teratocarcinoma cell lines. Proc Natl Acad Sci U S A 98: 14428-14433
Birnbaum K, Shasha DE, Wang JY, Jung JW, Lambert GM, Galbraith DW, Benfey PN (2003) A gene expression map of the Arabidopsis root. Science 302: 1956-1960
Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell 123: 1279-1291
Breitling R, Armengaud P, Amtmann A, Herzyk P (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573: 83-92
Brodersen P, Voinnet O (2006) The diversity of RNA silencing pathways in plants. Trends Genet 22: 268-280
Farrell CM, Lukens LN (1995) Naturally occurring antisense transcripts are present in chick embryo chondrocytes simultaneously with the down-regulation of the alpha 1 (I) collagen gene. J Biol Chem 270: 3400-3408
Gustafson AM, Allen E, Givan S, Smith D, Carrington JC, Kasschau KD (2005) ASRP: the Arabidopsis Small RNA Project Database. Nucleic Acids Res 33: D637-640
Haas BJ, Wortman JR, Ronning CM, Hannick LI, Smith RK, Jr., Maiti R, Chan AP, Yu C, Farzad M, Wu D, White O, Town CD (2005) Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biol 3: 7
Jen CH, Michalopoulos I, Westhead DR, Meyer P (2005) Natural antisense transcripts with coding capacity in Arabidopsis may have a regulatory role that is not linked to double-stranded RNA degradation. Genome Biol 6: R51
Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAs and their regulatory roles in plants. Annu. Rev. Plant Biol. 57: 19-53
Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC (2007) Genome-Wide Profiling and Analysis of Arabidopsis siRNAs. PLoS Biol 5: e57
Kiba T, Naitou T, Koizumi N, Yamashino T, Sakakibara H, Mizuno T (2005) Combinatorial microarray analysis revealing arabidopsis genes implicated in cytokinin responses through the His->Asp Phosphorelay circuitry. Plant Cell Physiol 46: 339-355
Kilian J, Whitehead D, Horak J, Wanke D, Weinl S, Batistic O, D'Angelo C, Bornberg-Bauer E, Kudla J, Harter K (2007) The AtGenExpress global stress data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. The Plant Journal online early access
Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A (2004) Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 14: 1719-1725
Kumar M, Carmichael GG (1998) Antisense RNA: function and fate of duplex RNA in cells of higher eukaryotes. Microbiol Mol Biol Rev 62: 1415-1434
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
16
Lehner B, Williams G, Campbell RD, Sanderson CM (2002) Antisense transcripts in the human genome. Trends Genet 18: 63-65
Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309: 1567-1569
Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S (2004) The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res 14: 1641-1653
Meyers BC, Vu TH, Tej SS, Ghazal H, Matvienko M, Agrawal V, Ning J, Haudenschild CD (2004) Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing. Nat Biotechnol 22: 1006-1011
Nakabayashi K, Okamoto M, Koshiba T, Kamiya Y, Nambara E (2005) Genome-wide profiling of stored mRNA in Arabidopsis thaliana seed germination: epigenetic and genetic regulation of transcription in seed. Plant J 41: 697-709
Nemhauser JL, Hong F, Chory J (2006) Different plant hormones regulate similar processes through largely nonoverlapping transcriptional responses. Cell 126: 467-475
Newbury SF (2006) Control of mRNA stability in eukaryotes. Biochem Soc Trans 34: 30-34
Osato N, Yamada H, Satoh K, Ooka H, Yamamoto M, Suzuki K, Kawai J, Carninci P, Ohtomo Y, Murakami K, Matsubara K, Kikuchi S, Hayashizaki Y (2003) Antisense transcripts with rice full-length cDNAs. Genome Biol 5: R5
Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 20: 3407-3425
Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Scholkopf B, Weigel D, Lohmann JU (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37: 501-506
Shendure J, Church GM (2002) Computational discovery of sense-antisense transcription in the human and mouse genomes. Genome Biol 3
Sureau A, Soret J, Guyon C, Gaillard C, Dumon S, Keller M, Crisanti P, Perbal B (1997) Characterization of multiple alternative RNAs resulting from antisense transcription of the PR264/SC35 splicing factor gene. Nucleic Acids Res 25: 4513-4522
Tufarelli C, Stanley JA, Garrick D, Sharpe JA, Ayyub H, Wood WG, Higgs DR (2003) Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease. Nat Genet 34: 157-165
Vanhee-Brossollet C, Vaquero C (1998) Do natural antisense transcripts make sense in eukaryotes? Gene 211: 1-9
Vazquez F (2006) Arabidopsis endogenous small RNAs: highways and byways. Trends Plant Sci 11: 460-468
Wang H, Chua NH, Wang XJ (2006) Prediction of trans-antisense transcripts in Arabidopsis thaliana. Genome Biol 7: R92
Wang XJ, Gaasterland T, Chua NH (2005) Genome-wide prediction and identification of cis-natural antisense transcripts in Arabidopsis thaliana. Genome Biol 6: R30
Wu Z, Irizarry RA, Gentleman R, Murillo FM, Spencer FA (2004) A model based background adjustment for oligonucleotide expresion arrays. Johns Hopkins University
Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, Pham P, Cheuk R, Karlin-Newmann G, Liu SX, Lam B, Sakano H, Wu T, Yu G, Miranda M, Quach HL, Tripp M, Chang CH, Lee JM, Toriumi M, Chan MM, Tang CC, Onodera CS, Deng JM, Akiyama K, Ansari Y, Arakawa T, Banh J, Banno F, Bowser L, Brooks S, Carninci P, Chao Q, Choy N, Enju A, Goldsmith AD, Gurjal M, Hansen NF, Hayashizaki Y, Johnson-Hopson C, Hsuan VW, Iida K, Karnes M, Khan S, Koesema E, Ishida J, Jiang PX, Jones T, Kawai J, Kamiya A, Meyers C, Nakajima M, Narusaka M, Seki M, Sakurai T, Satou M, Tamse R,
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
17
Vaysberg M, Wallender EK, Wong C, Yamamura Y, Yuan S, Shinozaki K, Davis RW, Theologis A, Ecker JR (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842-846
Yelin R, Dahary D, Sorek R, Levanon EY, Goldstein O, Shoshan A, Diber A, Biton S, Tamir Y, Khosravi R, Nemzer S, Pinner E, Walach S, Bernstein J, Savitsky K, Rotman G (2003) Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol 21: 379-386
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
18
Figure Legends
Figure 1
Distribution of pairwise Pearson correlation coefficients (PCC) for expression of pairs
of adjacent genes. PCCs of transcript-pairs in categories 1 (same strand, no overlap;
black), 2 (opposite strand, no overlap; grey), and 4* (opposite strand, overlap; red)
was calculated for four Affymetrix ATH1 microarray data sets (development,
hormones, abiotic stress/root, and abiotic stress/shoot) created by the AtGenExpress
initiative.
Figure 2
Expression profiles of selected cis-NATs. The NATs with the strongest anti-
correlation (lowest PCC) in a particular data set are shown as examples.
Figure 3
Correlation of the PCCs for 515 cis-NATs between different microarray data sets.
Figure 4
Scatter plot showing independence of PCCs for 515 cis-NATs and length of
transcript overlap.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
19
Tables
Table 1 Categories of adjacent transcript pairs in Arabidopsis
TU = transcribed unit; categories: 1) neighboring genes on the same strand, no overlap; 2)
neighboring genes on opposite strands, no overlap; 3) neighboring genes on the same
strand, overlap; 4) neighboring genes on opposite strands, overlap; TAIR6 genome
annotation and TAIR ATH1 probe set mapping were used; 4*) gene pairs in category 4 whose
overlap is supported by spliced or long ORF cDNA and/or EST clones.
Arabidopsis genome
transcripts transcript-pair category
Chr TU pairs
1
5'
5'3'
3'
5'5'3'3'
2
5'5'3'3'
5'5'3'3'
3
5'5'3'3'
5'5'3'3'
4
5'5'3'3'
5'
5'3'
3'
I 7593 7592 3936 3345 14 297
II 4994 4993 2641 2175 8 168
III 6129 6128 3254 2661 10 199
IV 4715 4714 2438 2085 13 178
V 6928 6927 3657 2983 8 279
total 30359 30354 15926 13249 53 1126
Affymetrix ATH1 array
transcripts transcript-pair category
Chr TU pairs 1 2 3 4 4*
I 5452 4196 2123 1865 6 202 142
II 3401 2544 1314 1128 4 98 75
III 4223 3215 1705 1381 1 128 77
IV 3110 2318 1170 1030 5 113 77
V 4835 3741 1946 1618 4 173 144
total 21021 16014 8258 7022 20 714 515
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
20
Table 2 Statistical analysis of PCC distributions
p-value data set
Category 1 vs. 2 Category 1 vs. 4* Category 2 vs. 4* +
development 0.5662 1.00 -05 1.97 -05
hormones 0.5536 8.06 -06 1.02 -05
abiotic stress, root 0.6453 1.43 -04 2.45 -04
abiotic stress, shoot 0.6727 3.40 -04 6.02 -04
Birnbaum 0.3581 9.00 -09 2..63 -08
p-values for differences in the distribution of PCC were calculated using two-sided, two
sample Welch t-test. +) PCC values for those 199 gene pairs that originally belonged to
category 4 (see Table 1), but whose overlap is not by manual curation, were included in
category 2 for this analysis.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
21
Table 3 Density of small RNA loci in cis-NATs and non-overlapping gene pairs
cis-NATs Annotation Source Non-overlapping
gene pairs Total Non-overlapping Overlapping
small RNA 1.467 0.388 0.315 1.126 TAIR6
MPSS 0.234 0.065 0.058 0.137
small RNA 0.808 0.295 0.290 0.448 ATH1
MPSS 0.139 0.061 0.0059 0.096
small RNA 0.800 0.304 0.305 0.337 curated
MPSS 0.138 0.061 0.062 0.064
TAIR6 small RNA 2.523 0.711 0.630 4.949
MPSS 0.772 0.321 0.347 1.904
ATH1 small RNA 1.455 0.552 0.575 2.777
MPSS 0.585 0.304 0.331 1.799
curated small RNA 1.442 0.571 0.579 3.970
MPSS 0.581 0.313 0.322 3.781
Density of small RNAs [loci/kb] according to the TAIR6 annotation and those cis-NATs
present on the ATH1 array before (ATH1) and after manual curation (curated) of the
overlapping region for MPSS (Meyers et al., 2004; Lu et al., 2005) and small RNA data sets
(Gustafson et al., 2005; Rajagopalan et al., 2006; Kasschau et al., 2007). Calculations were
preformed based on all gene pairs (top) and those gene-pairs that actually contain small RNA
loci (bottom). See Supplementary Material for detailed information.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
Henz et al.: Expression of natural antisense transcripts in Arabidopsis
22
Table 4 Analysis of differentially cis-NAT expression in RNA silencing mutants
genotype All transcripts cis-NATs (4) cis-NATs (4*)
dcl1-7 981 56 (5.7%) 43 (4.4%)
dcl2-1 145 7 (4.8%) 3 (2.1%)
dcl3-1 221 14 (6.3%) 8 (3.6%)
hen1-1 893 44 (4.9%) 34 (3.8%)
hst-15 895 55 (6.1%) 45 (5.0%)
hyl1-2 291 22 (7.6%) 16 (5.5%)
rdr1-1 105 8 (7.6%) 6 (5.7%)
rdr2-1 166 9 (5.4%) 5 (3.0%)
rdr6-15 397 22 (5.5%) 17 (4.3%)
Transcripts that changed significantly in a given genotype relative to the wild-type control are
indicated.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
www.plantphysiol.orgon June 13, 2020 - Published by Downloaded from Copyright © 2007 American Society of Plant Biologists. All rights reserved.
top related