the effect of base modification on rna polymerase #707 and

1
Jennifer L. Ong, Vladimir Potapov, Xiaoqing Fu, Nan Dai, Ivan R. Corrêa Jr., Nathan Tanner New England Biolabs, Ipswich, MA 01938, USA The Effect of Base Modification on RNA Polymerase and Reverse Transcriptase Fidelity Introduction Introduction Workflow Measuring combined transcription and reverse transcription fidelity with PacBio ® sequencing. (A) Workflow. DNA templates are transcribed by T7 RNA polymerase with standard and modified NTPs to produce RNA. RNA is replicated by a reverse transcriptase to produce cDNA, then the first strand is replicated by the same reverse transcriptase to produce double- stranded DNA, which is then prepared for PacBio sequencing by ligating SMRTbell™ adaptors. (B) Identical first strand errors can arise by misincorporation from either the RNA polymerase or the reverse transcriptase (error type 1 and 2 in the figure, respectively). Only first strand errors confirmed in the second strand are counted. Second strand errors produce a mismatch between the first and second strand and represent misincorporation by the reverse transcriptase on DNA templates (error type 3 in the figure). (C) Substitution errors arising from misincorporation. www.NEBNext.com Methods/Results Combined T7 RNA Polymerase and RT cDNA Errors First strand (cDNA) synthesis error rates and error spectrum for standard and modified RNA. The RNA template is synthesized by T7 RNA polymerase, and then reverse transcribed by the reverse transcriptases shown in the workflow. For comparison, also shown are the first strand error rates of Bst 2.0 and 3.0 DNA polymerases, DNA polymerases which can be used to reverse transcribe RNA. Polymerase substitution errors are written as the equivalent RNA polymerase substitution (top substitution) or reverse transcriptase substitution (bottom substitution). Base Modifications Can Alter Polymerase Fidelity First strand error rates of modified RNA normalized to regular RNA (Protoscript ® II reverse transcriptase). Relative substitution rates of each error type for each modification were normalized to unmodified RNA, for ProtoScript II reverse transcriptase (with T7 RNA polymerase). On the y-axis, E was calculated for each substitution type as (S - M) / S, where S is the substitution rate on unmodified RNA, and M is the substitution rate on RNA containing modified bases. An E value of 0 represents no change in fidelity compared to unmodified RNA, whereas the numerical values represent the fold-change relative to unmodified RNA. For each non-reference error identified during cDNA synthesis, the equivalent RNA polymerase error (top pair) and reverse transcriptase error (bottom pair) that could generate the corresponding first strand error are identified. Sequence Context Analysis of Pseudouridine-induced Transcription Errors Sequence context analysis of first strand errors. Sequence logos represent the identity of the bases surrounding each type of misincorporation, with respect to the reference RNA, for pseudouridine- containing and unmodified RNA. In each logo, bases are ordered most frequently (top) to least frequently (bottom) observed. In this example, T7 RNA polymerase was used to generate the RNA template, and ProtoScript II reverse transcriptase was used for reverse transcription. Reverse Transcriptase Fidelity Second strand error rates, representing the error rate for reverse transcriptases or Bst DNA polymerases replicating DNA templates. (A) Total error rates and distribution of substitution, deletion and insertion errors. (B) Normalized mutational spectrum of second strand error rates. Polymerase substitutions are written as (expected base) → (observed base). Summary Ribonucleic acid (RNA) is capable of hosting a variety of chemically diverse modifications. Post-transcriptional mRNA modifications can alter gene expression or mRNA stability, and can be conserved, regulated, and implicated in various cellular, developmental and disease processes. However, few studies have addressed how base modifications affect RNA polymerase and reverse transcriptase activity and fidelity, and hence, RNA sequencing data. Here, we describe the fidelity of RNA polymerization and reverse transcription of modified ribonucleotides using a fidelity assay based on Pacific Biosciences ® ' Single-Molecule Real-Time (SMRT ® ) sequencing. Several modified bases, including methylated (m 6 A, m 5 C and m 5 U), hydroxymethylated (hm 5 U) and isomeric bases (pseudouridine (Ψ)) were examined. Table 1. Total error rates for cDNA strand synthesis of canonical and noncanonical RNA Template Total error rate Percentage of total errors errors/base ×10 -6 Substitution % Deletion % Insertion % M-MuLV Reverse Transcriptase and T7 RNA Polymerase RNA 63 ± 12 78 11 11 m 6 A 149 ± 21 86 9 5 Ψ 114 ± 23 89 6 6 m 5 C 81 ± 18 86 9 5 m 5 U 65 ± 12 87 9 4 hm 5 U 185 ± 23 90 6 4 ProtoScript II Reverse Transcriptase and T7 RNA Polymerase RNA 56 ± 8 71 19 10 m 6 A 152 ± 8 80 11 8 Ψ 101 ± 21 90 7 3 m 5 C 70 ± 4 82 12 6 m 5 U 54 ± 2 81 15 4 hm 5 U 188 ± 24 87 9 5 AMV Reverse Transcriptase and T7 RNA Polymerase RNA 75 ± 11 87 5 8 m 6 A 164 ± 11 89 5 6 Ψ 116 ± 22 94 4 3 m 5 C 81 ± 2 92 3 5 m 5 U 73 ± 5 91 5 3 hm 5 U 192 ± 8 91 5 4 Bst 2.0 Reverse Transcriptase and T7 RNA Polymerase RNA 179 ± 105 78 16 6 Bst 3.0 Reverse Transcriptase and T7 RNA Polymerase RNA 181 ± 102 82 15 4 We developed an assay to measure the fidelity of cDNA synthesis on unmodified and modified RNA. cDNA (first strand) error rates are the combined error rates of T7 RNA polymerase and the reverse transcriptase. However, by normalizing base-specific error rates of the modified base to the equivalent standard RNA base, we were able to determine which modifications had an effect on either RNA polymerase or reverse transcriptase fidelity. 5-hydroxylmethyluracil and N6-methyladenosine both increased first strand error rates compared to the equivalent unmodified RNA base, whereas 5-methylcytosine and 5-methyluracil did not significantly affect first strand error rates. Pseudouridine (an isomer of uracil) was misincorporated across thymidine by T7 RNA polymerase at a greater frequency than uracil, and sequence context analysis revealed that a dT:dΨ mispair was also preferentially preceded by dG:rCTP incorporation. Reverse transcriptase-specific error rates were identified by an analysis of second strand errors, in which errors only arising during DNA templated DNA synthesis were counted. 5...CTACTATACATGACTAACAGTCGT...3Template dsDNA 3...GATGATATGTACTGATTGTCAGCA...5AAAAAAA|AA|AAAAAAAAAAA|AAAAAAAAAAA 5...CUGCUAUACAUGACUAACAGUCGU...3RNA AAAAAAA|AA|AAAAAAAAAAA|AAAAAAAAAAA 3’...GACGACATGTACTGATTGTCAGCA...5First strand DNA AAAAAAA|AA|AAAAAAAAAAA|AAAAAAAAAAA 5’...CTGCTGTACATGACTAATAGTCGT...3Second strand DNA Template dsDNA RNA cDNA (first strand) Circular amplicon T7 RNA Polymerase RNA synthesis RT first strand DNA synthesis RT second strand DNA synthesis PacBio library preparation PacBio SMRT sequencing Bioinformatics analysis dsDNA product B A 1 2 3 First strand error T7 RNA Polymerase error 1 RT error 2 Second strand error RT error 3 Correct incorporation Misincorporation Box indicates sequenced product C rA→rG dT→dC dC→dT 1 2 3 Substitution errors (exp.) (obs.) (exp.) (obs.) (exp.) (obs.) ATG _|_ UAC ATG _|_ UGC UAU _|_ ATA UAU _|_ ACA TGT _|_ ACA TGT _|_ ATA -2 2 6 10 14 18 22 NH O O N R HO hm 5 U HN NH O O R -2 0 2 4 6 8 10 12 14 Ψ -2 0 2 4 6 8 10 N N N N HN R m 6 A -2 0 2 4 6 8 10 NH O O N R m 5 U -2 0 2 4 6 8 10 N NH 2 O N R m 5 C Substitution type Relative fold change rArC dTdG rArU dTdA rArG dTdC rUrA dAdT rUrC dAdG rUrG dAdC rCrA dGdT rCrU dGdA rCrG dGdC rGrA dCdT rGrC dCdG rGrU dCdA 0 20 40 60 80 100 120 140 160 180 standard m6A Ψ m5C m5U hm5U standard m6A Ψ m5C m5U hm5U standard m6A Ψ m5C m5U hm5U standard standard M-MuLV ProtoScript II AMV Bst 2.0 Bst 3.0 (substitution/base ×10 -6 ) (RNAP) (RT) rArC dTdG rArU dTdA rArG dTdC rUrA dAdT rUrC dAdG rUrG dAdC rCrA dGdT rCrU dGdA rCrG dGdC rGrA dCdT rGrC dCdG rGrU dCdA DNA Polymerase Total error rate (errors/base ×10 -6 ) Substitution Deletion Insertion M-MuLV RT 84 ± 19 92% 6% 3% ProtoScript II RT 62 ± 9 91% 6% 3% AMV RT 52 ± 4 93% 5% 2% Bst 2.0 62 ± 5 92% 7% 1% Bst 3.0 70 ± 23 89% 8% 3% Second strand error rates A #707 A G U C U G C A U U A G C U G C A U (dN $1 ) $ dT $ (dN +1 ) (dN $1 ) $ dT $ (dN +1 ) unmodified RNA Ψ RNA

Upload: others

Post on 16-Mar-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Jennifer L. Ong, Vladimir Potapov, Xiaoqing Fu, Nan Dai, Ivan R. Corrêa Jr., Nathan TannerNew England Biolabs, Ipswich, MA 01938, USA

The Effect of Base Modification on RNA Polymerase and Reverse Transcriptase Fidelity

Introduction

Introduction

Workflow

Measuring combined transcription and reverse transcription fidelity with PacBio®sequencing. (A) Workflow. DNA templates are transcribed by T7 RNA polymerase with standardand modified NTPs to produce RNA. RNA is replicated by a reverse transcriptase to producecDNA, then the first strand is replicated by the same reverse transcriptase to produce double-stranded DNA, which is then prepared for PacBio sequencing by ligating SMRTbell™ adaptors. (B)Identical first strand errors can arise by misincorporation from either the RNA polymerase or thereverse transcriptase (error type 1 and 2 in the figure, respectively). Only first strand errorsconfirmed in the second strand are counted. Second strand errors produce a mismatch betweenthe first and second strand and represent misincorporation by the reverse transcriptase on DNAtemplates (error type 3 in the figure). (C) Substitution errors arising from misincorporation.

www.NEBNext.com

Methods/ResultsCombined T7 RNA Polymerase and RT cDNA Errors

First strand (cDNA) synthesis error rates and error spectrum for standard andmodified RNA. The RNA template is synthesized by T7 RNA polymerase, and thenreverse transcribed by the reverse transcriptases shown in the workflow. For comparison,also shown are the first strand error rates of Bst 2.0 and 3.0 DNA polymerases, DNApolymerases which can be used to reverse transcribe RNA. Polymerase substitutionerrors are written as the equivalent RNA polymerase substitution (top substitution) orreverse transcriptase substitution (bottom substitution).

Base Modifications Can Alter Polymerase Fidelity

First strand error rates of modified RNA normalized to regular RNA (Protoscript® II reverse transcriptase). Relative substitution rates of each error typefor each modification were normalized to unmodified RNA, for ProtoScript II reverse transcriptase (with T7 RNA polymerase). On the y-axis, E was calculated foreach substitution type as (S - M) / S, where S is the substitution rate on unmodified RNA, and M is the substitution rate on RNA containing modified bases. An Evalue of 0 represents no change in fidelity compared to unmodified RNA, whereas the numerical values represent the fold-change relative to unmodified RNA.For each non-reference error identified during cDNA synthesis, the equivalent RNA polymerase error (top pair) and reverse transcriptase error (bottom pair) thatcould generate the corresponding first strand error are identified.

Sequence Context Analysis of Pseudouridine-induced Transcription Errors

Sequence context analysis of first strand errors. Sequence logosrepresent the identity of the bases surrounding each type ofmisincorporation, with respect to the reference RNA, for pseudouridine-containing and unmodified RNA. In each logo, bases are ordered mostfrequently (top) to least frequently (bottom) observed. In this example,T7 RNA polymerase was used to generate the RNA template, andProtoScript II reverse transcriptase was used for reverse transcription.

Reverse Transcriptase Fidelity

Second strand error rates, representing the error rate for reverse transcriptases orBst DNA polymerases replicating DNA templates. (A) Total error rates and distributionof substitution, deletion and insertion errors. (B) Normalized mutational spectrum ofsecond strand error rates. Polymerase substitutions are written as (expected base) →(observed base).

Summary

Ribonucleic acid (RNA) is capable of hosting a variety of chemically diverse modifications. Post-transcriptional mRNA modifications can alter gene expression or mRNA stability, and can be conserved, regulated, and implicated in various cellular, developmental and disease processes. However, few studies have addressed how base modifications affect RNA polymerase and reverse transcriptase activity and fidelity, and hence, RNA sequencing data. Here, we describe the fidelity of RNA polymerization and reverse transcription of modified ribonucleotides using a fidelity assay based on Pacific Biosciences®' Single-Molecule Real-Time (SMRT®) sequencing. Several modified bases, including methylated (m6A, m5C and m5U), hydroxymethylated (hm5U) and isomeric bases (pseudouridine (Ψ)) were examined.

Table 1. Total error rates for cDNA strand synthesis of canonical and noncanonical RNA

Template Total error rate Percentage of total errors

errors/base ×10-6

Substitution %

Deletion %

Insertion %

M-MuLV Reverse Transcriptase and T7 RNA Polymerase RNA 63 ± 12 78 11 11 m6A 149 ± 21 86 9 5 Ψ 114 ± 23 89 6 6

m5C 81 ± 18 86 9 5 m5U 65 ± 12 87 9 4

hm5U 185 ± 23 90 6 4 ProtoScript II Reverse Transcriptase and T7 RNA Polymerase

RNA 56 ± 8 71 19 10 m6A 152 ± 8 80 11 8 Ψ 101 ± 21 90 7 3

m5C 70 ± 4 82 12 6 m5U 54 ± 2 81 15 4

hm5U 188 ± 24 87 9 5 AMV Reverse Transcriptase and T7 RNA Polymerase

RNA 75 ± 11 87 5 8 m6A 164 ± 11 89 5 6 Ψ 116 ± 22 94 4 3

m5C 81 ± 2 92 3 5 m5U 73 ± 5 91 5 3

hm5U 192 ± 8 91 5 4

Bst 2.0 Reverse Transcriptase and T7 RNA Polymerase RNA 179 ± 105 78 16 6

Bst 3.0 Reverse Transcriptase and T7 RNA Polymerase RNA 181 ± 102 82 15 4

We developed an assay to measure the fidelity of cDNA synthesis on unmodified and modified RNA. cDNA (first strand) error rates are the combined error rates of T7 RNA polymerase and the reverse transcriptase. However, by normalizing base-specific error rates of the modified base to the equivalent standard RNA base, we were able to determine which modifications had an effect on either RNA polymerase or reverse transcriptase fidelity. 5-hydroxylmethyluracil and N6-methyladenosine both increased first strand error rates compared to the equivalent unmodified RNA base, whereas 5-methylcytosine and 5-methyluracil did not significantly affect first strand error rates. Pseudouridine (an isomer of uracil) was misincorporated across thymidine by T7 RNA polymerase at a greater frequency than uracil, and sequence context analysis revealed that a dT:dΨ mispair was also preferentially preceded by dG:rCTP incorporation. Reverse transcriptase-specific error rates were identified by an analysis of second strand errors, in which errors only arising during DNA templated DNA synthesis were counted.

→→

5′...CTACTATACATGACTAACAGTCGT...3′ Template dsDNA3′...GATGATATGTACTGATTGTCAGCA...5′AAAAAAA|AA|AAAAAAAAAAA|AAAAAAAAAAA5′...CUGCUAUACAUGACUAACAGUCGU...3′ RNAAAAAAAA|AA|AAAAAAAAAAA|AAAAAAAAAAA3’...GACGACATGTACTGATTGTCAGCA...5′ First strand DNAAAAAAAA|AA|AAAAAAAAAAA|AAAAAAAAAAA5’...CTGCTGTACATGACTAATAGTCGT...3′ Second strand DNA

TemplatedsDNA

RNA

cDNA(first strand)

Circularamplicon

T7 RNA PolymeraseRNA synthesis

RT first strandDNA synthesis

RT second strandDNA synthesis

PacBio librarypreparation

PacBio SMRTsequencing

Bioinformatics analysis

dsDNAproduct

BA1

2

3

First strand errorT7 RNA Polymerase error1

RT error2

Second strand errorRT error3

Correct incorporation Misincorporation Box indicates sequenced product

CrA→rG dT→dC dC→dT1 2 3

Substitution errors

(exp.) (obs.) (exp.) (obs.) (exp.) (obs.)

ATG_|_UAC

ATG_|_UGC

UAU_|_ATA

UAU_|_ACA

TGT_|_ACA

TGT_|_ATA

-226

10141822

NH

O

ON

R

HOhm5U HN NH

O

O

R

-202468

101214

Ψ

-202468

10

N

NN

NHN

R

m6A

-202468

10

NH

O

ON

R

m5U

-202468

10

N

NH2

ON

R

m5C

Substitution type

Rel

ativ

e fo

ld c

hang

e

rA→rCdT→dG

rA→rUdT→dA

rA→rGdT→dC

rU→rAdA→dT

rU→rCdA→dG

rU→rGdA→dC

rC→rAdG→dT

rC→rUdG→dA

rC→rGdG→dC

rG→rAdC→dT

rG→rCdC→dG

rG→rUdC→dA

0 20 40 60 80 100 120 140 160 180

standard

m6A

Ψ

m5C

m5U

hm5U

standard

m6A

Ψ

m5C

m5U

hm5U

standard

m6A

Ψ

m5C

m5U

hm5U

standard

standard

M-M

uLV

Pro

toS

crip

tII

AM

V

Bst 2.0

Bst 3.0

(substitution/base ×10-6)

(RNAP)(RT)

rA→rCdT→dG

rA→rUdT→dA

rA→rGdT→dC

rU→rAdA→dT

rU→rCdA→dG

rU→rGdA→dC

rC→rAdG→dT

rC→rUdG→dA

rC→rGdG→dC

rG→rAdC→dT

rG→rCdC→dG

rG→rUdC→dA

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

M-MuLV RT

ProtoScript II RT

AMV RT

Bst 2.0

Bst 3.0

DNA Polymerase Total error rate (errors/base ×10-6)

Substitution Deletion Insertion

M-MuLV RT 84 ± 19 92% 6% 3% ProtoScript II RT 62 ± 9 91% 6% 3% AMV RT 52 ± 4 93% 5% 2% Bst 2.0 62 ± 5 92% 7% 1% Bst 3.0 70 ± 23 89% 8% 3%

Second strand error rates

Mutational spectrum

A

B

dA→dC dA→dT dA→dG dT→dA dT→dC dT→dGdT→dA dT→dT dT→dG dT→dA dT→dC dT→dT

#707

AGUC

dT:rT

UGCAU

U

AG

CrA:dA

UGCAU

(dN$1)'$ dT $ (dN+1)'

RNA 5’...(rN-1)-(rΨ)-(rN+1)...3′AAAAAAAAAAAAAAAAAAADNA 3′...(dN-1)-(dT)-(dN+1)...5′

(dN$1)'$ dT $ (dN+1)

A

B unmodified'RNAΨ RNA

mismatch