genome-wide measurement of rna secondary structure in yeast

5
LETTERS Genome-wide measurement of RNA secondary structure in yeast Michael Kertesz 1 *{, Yue Wan 2 *, Elad Mazor 1 , John L. Rinn 3 , Robert C. Nutter 4 , Howard Y. Chang 2 & Eran Segal 1,5 The structures of RNA molecules are often important for their function and regulation 1–6 , yet there are no experimental techniques for genome-scale measurement of RNA structure. Here we describe a novel strategy termed parallel analysis of RNA structure (PARS), which is based on deep sequencing fragments of RNAs that were treated with structure-specific enzymes, thus providing simul- taneous in vitro profiling of the secondary structure of thousands of RNA species at single nucleotide resolution. We apply PARS to profile the secondary structure of the messenger RNAs (mRNAs) of the budding yeast Saccharomyces cerevisiae and obtain structural profiles for over 3,000 distinct transcripts. Analysis of these profiles reveals several RNA structural properties of yeast transcripts, including the existence of more secondary structure over coding regions compared with untranslated regions, a three-nucleotide periodicity of secondary structure across coding regions and an anti-correlation between the efficiency with which an mRNA is translated and the structure over its translation start site. PARS is readily applicable to other organisms and to profiling RNA struc- ture in diverse conditions, thus enabling studies of the dynamics of secondary structure at a genomic scale. Existing experimental methods for measuring RNA structure can only probe a single RNA structure per experiment and are typically limited in the length of the probed RNA (Supplementary Note 1). To measure structural properties of many different RNAs simultaneously, we extracted polyadenylated transcripts from yeast growing in the log phase, renatured the transcripts in vitro and treated the resulting pool with RNase V1 and, separately, with S1 nuclease. RNase V1 preferen- tially cleaves phosphodiester bonds 39 of double-stranded RNA, whereas S1 nuclease preferentially cleaves 39 of single-stranded RNA 7 . Thus data from these two complementary enzymes should allow us to measure the degree to which each nucleotide was in a single- or double-stranded conformation (Fig. 1). We chose renatura- tion and enzymatic cleavage conditions under which the cleavage reac- tions occur with single-hit kinetics (Supplementary Fig. 1a, b) and where intramolecular, but not intermolecular, RNA–RNA interac- tions are dominant (Supplementary Fig. 1c, d). As a control, we also added two short RNA domains from HOTAIR, a human non-coding RNA 8 , and from the structurally known Tetrahymena group I intron ribozyme 9 . We devised a ligation method specifically to ligate V1- and S1- cleaved RNA to adaptors, and converted them into complementary DNA (cDNA) libraries suitable for deep sequencing (Supplementary Fig. 2). As both enzymes leave a 59 phosphate at the cleavage point and because only 59 phosphoryl-terminated RNAs are capable of ligating to our adaptors, we enrich for V1- and S1-cleaved fragments and select against random fragmentation and degradation products that typically have 59 hydroxyl (Supplementary Fig. 3). Thus each observed cleavage site provides evidence that the cut nucleotide was in a double-stranded (for V1-treated samples) or single-stranded (for S1-treated samples) conformation. As a quantitative measure at nuc- leotide resolution representing the degree to which a nucleotide was in a double- or single-stranded conformation, we took the log ratio between the number of sequence reads obtained for each nucleotide in the V1 and S1 experiments. A higher (lower) log ratio, or PARS score, thus denotes a higher (lower) probability for a nucleotide to be in a double-stranded conformation. We performed four independent V1 experiments and three inde- pendent S1 experiments, which were highly reproducible across replicates (correlation 5 0.60–0.93, Supplementary Table 1), result- ing in over 85 million sequence reads that map to the yeast genome, of which approximately 97% mapped to annotated transcripts (Supplementary Table 2). At an average nucleotide coverage above 1.0, we obtained structural information for over 3,000 yeast tran- scripts (Supplementary Table 3 and Supplementary Fig. 4a), covering in total over 4.2 million transcribed bases, which is approximately 100-fold more than all published RNA footprints to date. We used several tests to check for biases in our method. We found that RNase cleavage, adaptor ligation and cDNA conversion do not introduce significant sequence biases (Supplementary Fig. 5), that our protocol has a very small bias towards particular regions along the transcript (Supplementary Fig. 6) and that we capture RNA frag- ments in proportion to their abundance in the initial pool (Sup- plementary Fig. 4b, c). We also confirmed that signals generated by RNase V1 are highly distinct from those generated by S1 nuclease. Global inspection across all transcripts revealed that approximately 7% of the V1 and S1 peaks are shared (Methods, Supplementary Table 4 and Supplementary Fig. 7). These joint peaks could be the result of experimental noise introduced by non-specific enzymatic activity, but could also correspond to dynamic RNA regions or tran- scripts that fold into more than one stable conformation. To test whether PARS accurately measures RNA structures, we first confirmed that its signals are similar to those obtained with traditional footprinting. To this end, we performed ten separate footprinting experiments with either RNase V1 or S1 nuclease, on two domains from the Tetrahymena ribozyme, two domains from the human HOTAIR non-coding RNA, which we doped into our samples, and two domains of endogenous yeast mRNAs. In all cases, we found high agreement between our PARS signals and footprinting (correla- tions 5 0.40–0.97; Fig. 2 and Supplementary Figs 8–10). Notably, owing to length limitations of footprinting, we had to select short domains from each of the above transcripts, transcribe them in vitro and then apply footprinting. Thus footprinting may be inaccurate, *These authors contributed equally to this work. 1 Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel. 2 Howard Hughes Medical Institute, Program in Epithelial Biology, Stanford University School of Medicine, Stanford, California 94305, USA. 3 The Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA. 4 Life Technologies, Foster City, California 94404, USA. 5 Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel. {Present address: Department of Bioengineering, Stanford University and Howard Hughes Medical Institute, Stanford, California 94305, USA. Vol 467 | 2 September 2010 | doi:10.1038/nature09322 103 Macmillan Publishers Limited. All rights reserved ©2010

Upload: eran

Post on 21-Jul-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome-wide measurement of RNA secondary structure in yeast

LETTERS

Genome-wide measurement of RNA secondarystructure in yeastMichael Kertesz1*{, Yue Wan2*, Elad Mazor1, John L. Rinn3, Robert C. Nutter4, Howard Y. Chang2 & Eran Segal1,5

The structures of RNA molecules are often important for theirfunction and regulation1–6, yet there are no experimental techniquesfor genome-scale measurement of RNA structure. Here we describea novel strategy termed parallel analysis of RNA structure (PARS),which is based on deep sequencing fragments of RNAs that weretreated with structure-specific enzymes, thus providing simul-taneous in vitro profiling of the secondary structure of thousandsof RNA species at single nucleotide resolution. We apply PARS toprofile the secondary structure of the messenger RNAs (mRNAs) ofthe budding yeast Saccharomyces cerevisiae and obtain structuralprofiles for over 3,000 distinct transcripts. Analysis of these profilesreveals several RNA structural properties of yeast transcripts,including the existence of more secondary structure over codingregions compared with untranslated regions, a three-nucleotideperiodicity of secondary structure across coding regions and ananti-correlation between the efficiency with which an mRNA istranslated and the structure over its translation start site. PARS isreadily applicable to other organisms and to profiling RNA struc-ture in diverse conditions, thus enabling studies of the dynamics ofsecondary structure at a genomic scale.

Existing experimental methods for measuring RNA structure canonly probe a single RNA structure per experiment and are typicallylimited in the length of the probed RNA (Supplementary Note 1). Tomeasure structural properties of many different RNAs simultaneously,we extracted polyadenylated transcripts from yeast growing in the logphase, renatured the transcripts in vitro and treated the resulting poolwith RNase V1 and, separately, with S1 nuclease. RNase V1 preferen-tially cleaves phosphodiester bonds 39 of double-stranded RNA,whereas S1 nuclease preferentially cleaves 39 of single-strandedRNA7. Thus data from these two complementary enzymes shouldallow us to measure the degree to which each nucleotide was in asingle- or double-stranded conformation (Fig. 1). We chose renatura-tion and enzymatic cleavage conditions under which the cleavage reac-tions occur with single-hit kinetics (Supplementary Fig. 1a, b) andwhere intramolecular, but not intermolecular, RNA–RNA interac-tions are dominant (Supplementary Fig. 1c, d). As a control, we alsoadded two short RNA domains from HOTAIR, a human non-codingRNA8, and from the structurally known Tetrahymena group I intronribozyme9.

We devised a ligation method specifically to ligate V1- and S1-cleaved RNA to adaptors, and converted them into complementaryDNA (cDNA) libraries suitable for deep sequencing (SupplementaryFig. 2). As both enzymes leave a 59 phosphate at the cleavage pointand because only 59 phosphoryl-terminated RNAs are capable ofligating to our adaptors, we enrich for V1- and S1-cleaved fragmentsand select against random fragmentation and degradation products

that typically have 59 hydroxyl (Supplementary Fig. 3). Thus eachobserved cleavage site provides evidence that the cut nucleotide wasin a double-stranded (for V1-treated samples) or single-stranded (forS1-treated samples) conformation. As a quantitative measure at nuc-leotide resolution representing the degree to which a nucleotide wasin a double- or single-stranded conformation, we took the log ratiobetween the number of sequence reads obtained for each nucleotidein the V1 and S1 experiments. A higher (lower) log ratio, or PARSscore, thus denotes a higher (lower) probability for a nucleotide to bein a double-stranded conformation.

We performed four independent V1 experiments and three inde-pendent S1 experiments, which were highly reproducible acrossreplicates (correlation 5 0.60–0.93, Supplementary Table 1), result-ing in over 85 million sequence reads that map to the yeast genome,of which approximately 97% mapped to annotated transcripts(Supplementary Table 2). At an average nucleotide coverage above1.0, we obtained structural information for over 3,000 yeast tran-scripts (Supplementary Table 3 and Supplementary Fig. 4a), coveringin total over 4.2 million transcribed bases, which is approximately100-fold more than all published RNA footprints to date.

We used several tests to check for biases in our method. We foundthat RNase cleavage, adaptor ligation and cDNA conversion do notintroduce significant sequence biases (Supplementary Fig. 5), thatour protocol has a very small bias towards particular regions alongthe transcript (Supplementary Fig. 6) and that we capture RNA frag-ments in proportion to their abundance in the initial pool (Sup-plementary Fig. 4b, c). We also confirmed that signals generated byRNase V1 are highly distinct from those generated by S1 nuclease.Global inspection across all transcripts revealed that approximately7% of the V1 and S1 peaks are shared (Methods, SupplementaryTable 4 and Supplementary Fig. 7). These joint peaks could be theresult of experimental noise introduced by non-specific enzymaticactivity, but could also correspond to dynamic RNA regions or tran-scripts that fold into more than one stable conformation.

To test whether PARS accurately measures RNA structures, we firstconfirmed that its signals are similar to those obtained with traditionalfootprinting. To this end, we performed ten separate footprintingexperiments with either RNase V1 or S1 nuclease, on two domainsfrom the Tetrahymena ribozyme, two domains from the humanHOTAIR non-coding RNA, which we doped into our samples, andtwo domains of endogenous yeast mRNAs. In all cases, we found highagreement between our PARS signals and footprinting (correla-tions 5 0.40–0.97; Fig. 2 and Supplementary Figs 8–10). Notably,owing to length limitations of footprinting, we had to select shortdomains from each of the above transcripts, transcribe them in vitroand then apply footprinting. Thus footprinting may be inaccurate,

*These authors contributed equally to this work.

1Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel. 2Howard Hughes Medical Institute, Program in Epithelial Biology,Stanford University School of Medicine, Stanford, California 94305, USA. 3The Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts02142, USA. 4Life Technologies, Foster City, California 94404, USA. 5Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel. {Present address:Department of Bioengineering, Stanford University and Howard Hughes Medical Institute, Stanford, California 94305, USA.

Vol 467 | 2 September 2010 | doi:10.1038/nature09322

103Macmillan Publishers Limited. All rights reserved©2010

Page 2: Genome-wide measurement of RNA secondary structure in yeast

because lacking long-range interactions, the excised fragment couldfold differently when taken out of context. In contrast, PARS canprobe RNAs in their full-length context.

Next, we compared PARS with reported structures of yeast codingand non-coding RNAs, and found that it correctly reproduces theknown secondary structure of three structured RNA domains ofASH1 (ref. 10), of a structural element in URE2 mRNA11 and ofthe glutamate transfer RNA (Fig. 2e, f and Supplementary Figs 11and 12). This suggests that PARS can provide structural informationof transcripts in their full-length context and endogenous abundancefrom within a complex RNA pool. Taken together, our analysesdemonstrate that PARS recapitulates results obtained by low-throughput methods with high accuracy, and has advantages overexisting methods, stemming from its ability to probe structures oflong RNAs.

As another independent validation of PARS, we compared it withcomputational predictions of RNA structure, by applying the Viennapackage12 to the 3,000 transcripts that we analysed. We found a sig-nificant correspondence between these predictions and our PARSscores, whereby nucleotides with high (low) double-stranded PARSscore had a significantly higher (lower) average predicted pairingprobability (P , 102200; Fig. 3a and Supplementary Fig. 13).Despite this significant global correspondence, there are large differ-ences between PARS and predictions, in part owing to noise in ourapproach but also because of known inaccuracies of folding algo-rithms. We thus suggest that genome-wide PARS data can be usedto constrain folding algorithms and improve their accuracy, as previ-ously shown for specific RNAs13,14 (Supplementary Fig. 15).

We used the obtained structural profiles to investigate five globalproperties of yeast transcripts. First, examining the average PARSscore across the coding regions and untranslated regions (UTRs),we found that coding regions exhibit significantly more pairing than59 and 39 UTRs (P , 10230 and P , 10250, respectively; Fig. 3c).

Notably, the start and stop codons each exhibit local minima ofPARS scores, indicating reduced tendency for double-stranded con-formation and increased accessibility. These findings agree with pre-vious computational predictions for mouse and human genes15. Theevolutionary conservation of this global organization of mRNAsecondary structure suggests that it may have functional importance.An overall unstructured background in UTRs may allow functionalelements to stand out and, conversely, highly paired domains alongcoding regions may protect against ectopic translation initiation, orregulate ribosome translocation and protein folding, as recentlypostulated13.

Second, aligning our measured transcripts about their start codonand applying a discrete Fourier transform analysis to the averagePARS signal, we detected a periodic structure signal across codingregions with a cycle of three nucleotides, such that, on average, thefirst nucleotide of each codon is least structured and the secondnucleotide is most structured. Notably, this periodic signal is onlyfound in coding regions and not in UTRs (Fig. 3b), and the degree ofthree-nucleotide periodicity in transcripts is significantly associatedwith ribosome density in vivo16 (Supplementary Fig. 14), suggestingthat this periodicity may directly or indirectly facilitate translation.

Third, we tested whether there is a correlation between mRNAstructure around the translation start site and translation efficiency.Such a relation has long been hypothesized17 and recently shown forone reporter protein in E. coli18. We found a small but significant anti-correlation between PARS scores at the region located approximately10 base pairs (bp) upstream of the translation start site and ribosomedensity throughout the transcript16, a proxy for translational efficiency(correlation 5 20.1, P , 1024; Fig. 4a). Intriguingly, the 210-bpregion corresponds to the 59 position of the first ribosome on yeastmRNAs16. To examine this relation in more detail, we applied k-meansclustering (k 5 4) to the PARS structural profile of the 640 bp sur-rounding the translation start site. Notably, genes from clusters 3 and

Library preparationdeep sequencing

Library preparationdeep sequencing

In vitro folding

V1

Random fragmentation

mRNA 5′ cap

5′ OH 5′ OH

5′P

5′P

5′P

5′P

AAAAAA

AGGCAUGCACCUGGUAGCUAGUCUUUAAACC …

V1 profile

Nu

mb

er

of

read

s

S1

RNase V1 digestion

Random fragmentation

S1 nuclease digestion

AGGCAUGCACCUGGUAGCUAGUCUUUAAACC …

S1 profile

AGGCAUGCACCUGGUAGCUAGUCUUUAAACC …

PARS score

0

c

a b

log2

V1S1( (

Figure 1 | Measuring structural properties of RNA by deep sequencing.a, RNA molecules are cleaved by RNase V1, which cuts 39 of double-strandedRNA, leaving a 59 phosphate (59P). One such cut is illustrated by a red arrow.After random fragmentation, V1-generated fragments are specificallycaptured and subjected to deep sequencing. Each aligned sequence providesstructural evidence about a single base. The marked red square illustrates theevidence obtained from one mapped sequence (red). Additional evidence(grey boxes) is collected by mapping more sequences (grey horizontal bars).

A large number of reads aligned to the same base indicates that the base iscleaved many times by RNase V1 and is thus more likely to be in double-stranded conformation. b, Same as a, but the RNA sample is treated with S1nuclease, which cuts 39 of single-stranded RNA. Collected reads in this casesuggest that the base was unpaired in the original RNA structure. c, Bycombining the data extracted from the two complementary experimentsa and b, we obtain a nucleotide-resolution score representing the likelihoodthat the inspected base was in a double- or single-stranded conformation.

LETTERS NATURE | Vol 467 | 2 September 2010

104Macmillan Publishers Limited. All rights reserved©2010

Page 3: Genome-wide measurement of RNA secondary structure in yeast

4 exhibit significantly less structure in their 59 UTR than in the begin-ning of their coding region, as well as a higher ribosome density(Fig. 4b). Overall, these results provide the first genome-wide experi-mental validation for the suggestion that mRNA secondary structurearound the start codon may reduce translational efficiency17, althoughthe low correlation we found implies that, in vivo, translational effi-ciency is determined by additional factors.

Fourth, we asked whether genes with shared biological functions orcytotopic localizations19 tend to have similar PARS scores, indicativeof similar degrees of secondary structures. We found a rich picture ofbiological coordination (Supplementary Fig. 16 and Supplementary

Table 5), including increased RNA structure, especially in codingregions, in transcripts whose encoded proteins localize to distinctcellular domains or participate in distinct metabolic pathways. Wealso found that mRNAs with the least secondary structure in their 59

UTR and coding sequences encode subunits of the ribosome.Finally, we examined the PARS score of transcripts predicted to

encode a signal peptide, because a recent study showed that RNAsequences encoding the signal sequence (termed the signal sequencecoding region (SSCR)) of secretory proteins function as RNA ele-ments that promote RNA nuclear export20. We found that the 59

UTR region and approximately first 30 coding nucleotides of signal

e f ASH1-E1 URE2

1 2 3 4 5 a

c

b

d 2 3 4

1 2 3 4 5

61

73

85

97

109 118

1

2

3

4

5

55 58

65

71

80

91

102

111

131

3

4

5

1

2

3 4

1

2

0 1 2 3 4 5 6

0

20

40

60

80

100

0

20

40

60

80

100V

1 r

ead

s (×

10

3)

0

2

4

6

8

10

60 70 80 90 100

S1

read

s (×

10

3)

CCW12 base

0 1 2 3 4 5 6

0

20

40

60

80

100

V1 r

ead

s (×

10

3)

Fo

otp

rint

(no

rmaliz

ed

)F

oo

tprin

t(n

orm

aliz

ed

)F

oo

tprin

t(n

orm

aliz

ed

)F

oo

tprin

t(n

orm

aliz

ed

)

0 1 2 3 4 5 6 7

60 70 80 90 100 110 120 0

20

40

60

80

100

S1 r

ead

s (×

10

3)

RPL41A base

GACUAUGUUAA

AA

UACGCGA

AG

AAGUGGCU

CAUU

UC A

A G C C A U UA

AG

UA U A C C

CA

AC

U U A A C U A A U A AU C

GAUA

ACGAGA

AUAAUAUC

AAGAAU

ACCU

UA

GAACAAC

AU

CG

AC

AA C A A C

AA

CA G GC

A U U U UC G

G A U A UG A G

U CA

C G UG G A

G

0

0

600

1,200 0

100 200 300

420 440 460 480 500

0

25

50

950 970 990 1010 1030

0

500

1,000

V1

read

sS

1

read

sP

AR

Ssco

re

V1

read

sS

1

read

sP

AR

Ssco

re

Base Base

0

1 3 5 7 92 4 6 8

1 3 5 7 92 4 6 8

Unpaired Paired

R = 0.51

R = 0.66

R = 0.55

R = 0.69

Figure 2 | PARS correctly recapitulates results of RNA footprinting andknown structures. a, The PARS signal obtained for bases 50–110 of the yeastgene CCW12 using the double-stranded cutter RNase V1 (red bars) or thesingle-stranded cutter S1 nuclease (green bars) accurately matches thesignals obtained by traditional footprinting of that same transcript domain(black lines). The PARS signal is shown as the number of sequence reads thatmapped to each nucleotide; footprinting results are obtained by semi-automated quantification of the RNase lanes shown in b. The red arrowsindicate RNase V1 cleavages, the green arrows indicate S1 nuclease cleavagesas shown in the gel (b). b, Gel analysis of RNase V1 (lanes 5, 6) and S1nuclease (lanes 3, 4) probing of CCW12. Additionally, RNase T1 ladder

(lanes 2, 8), alkaline hydrolysis (lanes 1, 9) and no RNase treatment (lane 7)are shown. c, The PARS signal obtained for bases 50–120 of the yeast geneRPL41A matches the signals obtained by traditional footprinting. d, RNaseV1 (lanes 5, 6) and S1 nuclease (lanes 7, 8) probing of RPL41A, RNase T1ladder (lane 2), alkaline hydrolysis (lanes 1, 9) and no RNase treatment (lane4). e, f, Raw number of reads obtained using RNase V1 (red bars) or S1nuclease (green bars) and the resulting PARS score (blue bars) along oneinspected domain of ASH1 (e) and URE2 (f). Also shown are the knownstructures of the inspected domains with nucleotides colour-codedaccording to their computed PARS score. The Pearson correlations (R)between PARS and traditional footprinting are indicated.

NATURE | Vol 467 | 2 September 2010 LETTERS

105Macmillan Publishers Limited. All rights reserved©2010

Page 4: Genome-wide measurement of RNA secondary structure in yeast

CDS 5′ UTR 3′ UTR CDS

b

0.17

a

c

0.37

Start Stop

–0.008

P < 10–10

–1.0

–0.5

0

0.5

1.0

1.5

–60 –40 –20 0 20 40 60 80 100 120

Avera

ge P

AR

S s

co

re

Position

–120 –100 –80 –60 –40 –20 0 20 40 60–1.0

–0.5

0

0.5

1.0

1.5

Avera

ge P

AR

S s

co

re

Position

0.25

0.35

0.45

0.55

0.65

0.75

–6 –4 –2 0 2 4 6

Avera

ge p

red

ictio

n s

co

re

PARS score

PARSRandom

0

100

200

300

400

2 3 4 5 6 7 8 9 10

Am

plit

ud

e

Period (number of nucleotides)

CDS5′ UTR3′ UTR

0 0.2 0.4 0.6 0.8

1

1 2 3

Avera

ge

PA

RS

sco

re

Codon position

**

Figure 3 | Functional units of the transcript are demarcated by distinctproperties of RNA structure. a, Significant correspondence between PARS andcomputational predictions of RNA structure. We used the Vienna package12 tofold the 3,000 yeast mRNAs used in our analysis, and extracted the predicteddouble-stranded probability of each nucleotide. The average predicted double-stranded probability of each nucleotide (y axis) is shown, where nucleotideswere sorted by their PARS score (x axis). Average and standard deviation from1,000 shuffle experiments in which a random prediction score was assigned to

each probed base are shown in grey. b, Discrete Fourier transform of averagePARS score across the coding region, 39 UTR and 59 UTR. Inset shows PARSscore obtained for each of the three positions of every codon, averaged across allcodons. c, PARS score across the 59 UTR, the coding region and the 39 UTR,averaged across all transcripts used in our analysis. Transcripts were aligned bytheir translational start and stop sites for the left and right panel, respectively;start and stop codons are indicated by grey bars; horizontal bars denote theaverage PARS score per region.

–0.5

–0.4

–0.3

–0.2

–0.1

0.0

0.1

0.2

–60 –40 –20 0 20 40 60 80 100 120

Rela

tive P

AR

S s

co

re

Position (window centre)

Random setsNon secretory

Secretory

a c

P < 4 × 10–4

P < 10–11

n = 2,501 n = 499

n = 254

n = 251

n = 311

n = 315

b 1

2

3

4

10–5

10–4

10–3

10–2

10–1

100

–40 –20 0 20 40 60 80 100

Anti-c

orr

ela

tio

n P

valu

e

Position (bp)

–1

0

1

–40 –20 0 20 40

–40 –20 0 20 40

–40 –20 0 20 40

–40 –20 0 20 40

–1

0

1

–1

0

1

–1

0

1

0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 4 5

Cum

ula

tive

Ribosome density

Rela

tive

PA

RS

sco

re

–5

+5

Rela

tive P

AR

S s

co

re

Figure 4 | Structure around start codons correlates with low translationalefficiency. a, Sliding window analysis of local PARS score and ribosomedensity16. The significance (P value) of the anti-correlation between averagePARS score along a 40-bp-wide window and the reported ribosome density isshown. b, Left, k-means clustering of PARS scores across the 640-bpwindow surrounding the translation start site of all transcripts for whichenough coverage was obtained. The average structural profile and number ofmember genes is shown to the right of each cluster. Right, Cumulativedistribution plot of ribosome occupancy for each cluster and the associatedKolmogorov–Smirnoff test P value between the distribution of clusters 1 and

4. c, Tendency for less RNA structure in the first 30 bases of open readingframes encoding predicted secretory proteins. Although structure typicallybuilds up immediately upon entry to the coding sequence, genes predicted tocode for secretory proteins retain low structure in approximately the first 30bases of the coding sequence, consistent with a dual function of SSCRs inboth protein coding and targeting of the mRNA20. The figure shows averagerelative PARS scores (Methods) across a 30-bp sliding window for the 499genes coding for secretory proteins (blue), the remaining 2,501 genes (green)and the mean and standard deviation obtained from 1,000 shuffleexperiments in which sets of 499 genes were randomly selected (grey).

LETTERS NATURE | Vol 467 | 2 September 2010

106Macmillan Publishers Limited. All rights reserved©2010

Page 5: Genome-wide measurement of RNA secondary structure in yeast

peptide transcripts have a lower PARS signal, indicating increasedsingle-stranded propensity compared with other transcripts(P , 10211; Fig. 4c). Because RNA sequences encoding the signalsequence typically reside in the beginning of the coding region, theseresults suggest that specific secondary RNA structure around genestarts may assist in the cytotopic localization of mRNAs and theirresulting proteins. More generally, we suggest that PARS can be usedboth to generate and test hypotheses of signals of secondary structurethat may characterize and have functional importance for classes ofmRNAs.

In summary, we introduced PARS, the first high-throughputapproach for genome-wide experimental measurement of RNA struc-tural properties, and showed that it recovers structural profiles withhigh accuracy and at nucleotide resolution. Like most existing methods,one limitation of PARS is that it maps RNA structures in vitro, and itsreported structures may thus differ significantly from the in vivo con-formations. This may be addressed in the future by using reagents thatcan probe RNA structure in living cells7, but it will require new methodsto adapt to deep sequencing. Overall, PARS transforms the field of RNAstructure probing into the realm of high-throughput, genome-wideanalysis and should prove useful both in determining the structure ofentire transcriptomes of other organisms and in systematically mea-suring the effects of diverse conditions on RNA structure. Probing RNAstructure in the presence of different ligands, proteins or in differentphysical or chemical conditions may provide further insights into howRNA structures control gene activity.

METHODS SUMMARYSample preparation. Total RNA was extracted from yeast grown at 30 uC to

exponential phase in yeast peptone dextrose (YPD) medium by using hot acid

phenol. Poly(A)1 RNA was obtained by purifying it twice using the Poly(A)

Purist Kit. Supplementary Fig. 2 shows the PARS protocol.

Sequencing library construction. RNA was folded and probed for structure

using 0.01 U RNase V1 (Ambion), or 1,000 U of S1 nuclease (Fermentas), in a

100-ml reaction volume. A modified version (see Supplementary Methods) of the

SOLiD Small RNA Expression Kit was used to convert fragments into a sequencing

library.

SOLiD sequencing and mapping. cDNA libraries were amplified onto beads

and subjected to emulsion PCR, according to the standard protocol described in

the SOLiD Library Preparation Guide. Obtained sequences were truncated to

35 bp, and required to map uniquely to either the yeast genome or transcrip-

tome, allowing up to one mismatch and no insertions or deletions.

Computing the PARS score. The PARS score is defined as the log2 of the ratio

between the number of times the nucleotide immediately downstream of theinspected nucleotide was observed as the first base when treated with RNase V1

and the number of times it was observed in the S1 nuclease treated sample. To

account for differences in overall sequencing depth between the V1- and S1-

treated samples, the number of reads for each nucleotide is normalized before the

computation of the ratio.

Periodicity. Periodicity was analysed by applying a discrete Fourier transform to

the average PARS score collected from the following genomic features: the last

100 bases of the 59 UTR, the first 200 bases of the coding sequence and the 100

first bases of the 39 UTR.

Online resources. Nucleotide-resolution raw reads and PARS scores for the

3,000 genes included in our analysis can be visualized and downloaded at

http://genie.weizmann.ac.il/pubs/PARS10 or using the PARS iPhone App.

Received 15 March; accepted 28 June 2010.

1. Arava, Y. et al. Genome-wide analysis of mRNA translation profiles inSaccharomyces cerevisiae. Proc Natl Acad Sci USA 100, 3889–3894 (2003).

2. Wang, Y. et al. Precision and functional specificity in mRNA decay. Proc. Natl Acad.Sci. USA 99, 5860–5865 (2002).

3. Takizawa, P. A., DeRisi, J. L., Wilhelm, J. E. & Vale, R. D. Plasma membranecompartmentalization in yeast by messenger RNA transport and a septindiffusion barrier. Science 290, 341–344 (2000).

4. Shepard, K. A. et al. Widespread cytoplasmic mRNA transport in yeast:identification of 22 bud-localized transcripts using DNA microarray analysis. Proc.Natl Acad. Sci. USA 100, 11429–11434 (2003).

5. Tucker, B. J. & Breaker, R. R. Riboswitches as versatile gene control elements. Curr.Opin. Struct. Biol. 15, 342–348 (2005).

6. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of siteaccessibility in microRNA target recognition. Nature Genet. 39, 1278–1284(2007).

7. Ziehler, W. A. & Engelke, D. R. in Current Protocols in Nucleic Acid Chemistry Ch. 6,Unit 6.1 (John Wiley, 2001).

8. Rinn, J. L. et al. Functional demarcation of active and silent chromatin domains inhuman HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).

9. Guo, F., Gooding, A. R. & Cech, T. R. Structure of the Tetrahymena ribozyme: basetriple sandwich and metal ion at the active site. Mol. Cell 16, 351–362 (2004).

10. Chartrand, P., Meng, X. H., Huttelmaier, S., Donato, D. & Singer, R. H. Asymmetricsorting of ash1p in yeast results from inhibition of translation by localizationelements in the mRNA. Mol. Cell 10, 1319–1330 (2002).

11. Reineke, L. C., Komar, A. A., Caprara, M. G. & Merrick, W. C. A small stem loopelement directs internal initiation of the URE2 internal ribosome entry site inSaccharomyces cerevisiae. J. Biol. Chem. 283, 19011–19025 (2008).

12. Hofacker, I. L., Fekete, M. & Stadler, P. F. Secondary structure prediction foraligned RNA sequences. J. Mol. Biol. 319, 1059–1066 (2002).

13. Watts, J. M. et al. Architecture and secondary structure of an entire HIV-1 RNAgenome. Nature 460, 711–716 (2009).

14. Mathews, D. H. et al. Incorporating chemical modification constraints into adynamic programming algorithm for prediction of RNA secondary structure. Proc.Natl Acad. Sci. USA 101, 7287–7292 (2004).

15. Shabalina, S. A., Ogurtsov, A. Y. & Spiridonov, N. A. A periodic pattern of mRNAsecondary structure created by the genetic code. Nucleic Acids Res. 34,2428–2437 (2006).

16. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosomeprofiling. Science 324, 218–223 (2009).

17. Kozak, M. Regulation of translation via mRNA structure in prokaryotes andeukaryotes. Gene 361, 13–37 (2005).

18. Kudla, G., Murray, A. W., Tollervey, D. & Plotkin, J. B. Coding-sequencedeterminants of gene expression in Escherichia coli. Science 324, 255–258 (2009).

19. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The GeneOntology Consortium. Nature Genet. 25, 25–29 (2000).

20. Palazzo, A. F. et al. The signal sequence coding region promotes nuclear export ofmRNA. PLoS Biol. 5, e322 (2007).

Supplementary Information is linked to the online version of the paper atwww.nature.com/nature.

Acknowledgements We thank D. Herschlag’s group, A. Adler, A. Fire, M. Kay, theLife Technologies SOLiD team, M. Rabani, G. Sherlock and A. Weinberger forassistance and critiques. This work was supported by a National Institutes ofHealth grant (RO1HG004361). Y.W. is funded by the Agency of Science,Technology and Research of Singapore. H.Y.C. is an Early Career Scientist of theHoward Hughes Medical Institute. E.S. is the incumbent of the Soretta and HenryShapiro career development chair.

Author Contributions M.K., J.L.R., H.Y.C. and E.S. conceived the project; Y.W. andH.Y.C. developed the protocol and designed the experiments; Y.W. performed allexperiments; M.K., E.M. and E.S. planned and conducted the data analysis; J.L.R.and R.C.N. helped with sequencing; M.K., Y.W., E.M., H.Y.C. and E.S. wrote thepaper with contributions from all authors.

Author Information Sequencing data are deposited in the Gene ExpressionOmnibus under accession number GSE22393. Reprints and permissionsinformation is available at www.nature.com/reprints. The authors declare nocompeting financial interests. Readers are welcome to comment on the onlineversion of this article at www.nature.com/nature. Correspondence and requestsfor materials should be addressed to H.Y.C. ([email protected]) or E.S.([email protected]).

NATURE | Vol 467 | 2 September 2010 LETTERS

107Macmillan Publishers Limited. All rights reserved©2010