when dna gets in the way: a cautionary note for dna ...1 z. zhou et al., extracellular rna in a...

3
LETTER When DNA gets in the way: A cautionary note for DNA contamination in extracellular RNA-seq studies Jasper Verwilt a,b,1 , Wim Trypsteen a,b , Ruben Van Paemel a,b , Katleen De Preter a,b , Maria D. Giraldez c,d , Pieter Mestdagh a,b , and Jo Vandesompele a,b With great interest, we read the paper by Zhou et al. (1) describing a methodology that enables extracel- lular RNA sequencing (exRNA-seq) from extremely low input (Small Input Liquid Volume Extracellular RNA Sequencing [SILVER-seq]). We were intrigued by the high number of detected genes compared to our previous studies (2, 3) and noticed low repro- ducibility. We hypothesized that these observations could originate from substantial DNA contamination. Therefore, we reanalyzed the SILVER-seq data (4) to determine the extent of DNA signal in the sequenc- ing reads. First, we analyzed the fraction of reads mapping to the different genomic regions. We noticed that these fractions closely resembled the distributions in the genome (Fig. 1A). Specifically, fewer than 5% of the reads mapped to exonic regions, while our own exRNA-seq data (3) showed an average of 35% ex- onic reads. Secondly, we analyzed reads mapping to spliced sequences, expecting them to be relatively abundant in RNA. However, we found that reads mapping to spliced sequences made up only 0.22% of the total uniquely mapped reads, whereas, in our own RNA-seq data, they represented 17.8%, about 81-fold higher (Fig. 1B). Thirdly, we generated copy number profiles for a female patient with breast cancer (SRR9094442) and a healthy male control (SRR9094547). The cancer patients profile showed a pattern with clear copy number changes (e.g., chromosomes 5, 11, and 20), a result typically found using cell- free DNA data (Fig. 2A). The copy number profile of the male control displayed an almost flat copy number profile, with chromosomes X and Y showing half the copy number levels of the autosomes (Fig. 2B), in line with the expectations of a normal controls cell-free DNA. Finally, strandedness assessment of the SILVER-seq reads could not unambiguously con- firm that the data come from RNA (Fig. 1C). This means that either the library preparation method does not preserve strand orientation of the fragments (which is not specified in the paper) or that the data are predominantly coming from DNA. In an attempt to use only reads that must originate from RNA, we looked at exRNA genes with reads mapping over splice junctions and with transcripts per million higher than 5, as recom- mended by the authors (1). A median of only 560 genes per sample remain after filtering, or 44 times lower than reported. Our reanalyses present evidence supporting that the majority of the SILVER-seq data are derived from DNA, rather than exRNA. Although the authors per- formed a DNase treatment aimed to prevent this issue (1), no quality control was performed to verify its efficacy. We hypothesize that the amount of cell-free DNA was too high or that inhibitors present in serum precluded efficient enzymatic DNA removal. Moreover, the authors did not perform any data analysis evaluating the pres- ence of DNA signal in their sequencing data, as the ones reported here. Importantly, we emphasize that our ob- servations do not undermine the potential utility of SILVER-seq. Our letter aims to serve as a reminder of the current limitations of RNA-seq workflows on biofluids and as a plea for extensive quality control of RNA-seq data in general. Data Availability Statement The code used for data analysis is available on GitHub at https://github.com/jasperverwilt/SILVER-Seq_comment (5). a Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium; b OncoRNALab, Cancer Research Institute Ghent, 9000 Ghent, Belgium; c Digestive Diseases Unit, Virgen del Rocio University Hospital, 41013 Seville, Spain; and d OncoDigest Group, Institute of Biomedicine of Seville (IBiS), 41013 Seville, Spain Author contributions: J. Verwilt, W.T., R.V.P., P.M., and J. Vandesompele designed research; J. Verwilt and R.V.P. performed research; J. Verwilt and R.V.P. analyzed data; and J. Verwilt, W.T., R.V.P., K.D.P., M.D.G., P.M., and J. Vandesompele wrote the paper. The authors declare no competing interest. Published under the PNAS license. 1 To whom correspondence may be addressed. Email: [email protected]. 1893418936 | PNAS | August 11, 2020 | vol. 117 | no. 32 www.pnas.org/cgi/doi/10.1073/pnas.2001675117 LETTER Downloaded by guest on August 31, 2021

Upload: others

Post on 12-Jul-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: When DNA gets in the way: A cautionary note for DNA ...1 Z. Zhou et al., Extracellular RNA in a single droplet of human serum reflects physiologic and disease states. Proc. Natl. Acad

LETTER

When DNA gets in the way: A cautionary note forDNAcontaminationinextracellularRNA-seqstudiesJasper Verwilta,b,1, Wim Trypsteena,b, Ruben Van Paemela,b, Katleen De Pretera,b,Maria D. Giraldezc,d, Pieter Mestdagha,b, and Jo Vandesompelea,b

With great interest, we read the paper by Zhou et al.(1) describing a methodology that enables extracel-lular RNA sequencing (exRNA-seq) from extremelylow input (Small Input Liquid Volume ExtracellularRNA Sequencing [SILVER-seq]). We were intriguedby the high number of detected genes comparedto our previous studies (2, 3) and noticed low repro-ducibility. We hypothesized that these observationscould originate from substantial DNA contamination.Therefore, we reanalyzed the SILVER-seq data (4) todetermine the extent of DNA signal in the sequenc-ing reads.

First, we analyzed the fraction of reads mapping tothe different genomic regions. We noticed that thesefractions closely resembled the distributions in thegenome (Fig. 1A). Specifically, fewer than 5% of thereads mapped to exonic regions, while our ownexRNA-seq data (3) showed an average of 35% ex-onic reads. Secondly, we analyzed reads mapping tospliced sequences, expecting them to be relativelyabundant in RNA. However, we found that readsmapping to spliced sequences made up only 0.22%of the total uniquely mapped reads, whereas, in ourown RNA-seq data, they represented 17.8%, about81-fold higher (Fig. 1B). Thirdly, we generated copynumber profiles for a female patient with breast cancer(SRR9094442) and a healthy male control (SRR9094547).The cancer patient’s profile showed a pattern withclear copy number changes (e.g., chromosomes5, 11, and 20), a result typically found using cell-free DNA data (Fig. 2A). The copy number profileof the male control displayed an almost flat copynumber profile, with chromosomes X and Y showinghalf the copy number levels of the autosomes (Fig.

2B), in line with the expectations of a normal control’scell-free DNA. Finally, strandedness assessment ofthe SILVER-seq reads could not unambiguously con-firm that the data come from RNA (Fig. 1C). Thismeans that either the library preparation methoddoes not preserve strand orientation of the fragments(which is not specified in the paper) or that the data arepredominantly coming from DNA. In an attempt to useonly reads that must originate from RNA, we looked atexRNA genes with reads mapping over splice junctionsand with transcripts per million higher than 5, as recom-mended by the authors (1). A median of only 560 genesper sample remain after filtering, or 44 times lowerthan reported.

Our reanalyses present evidence supporting thatthe majority of the SILVER-seq data are derived fromDNA, rather than exRNA. Although the authors per-formed a DNase treatment aimed to prevent this issue(1), no quality control was performed to verify its efficacy.We hypothesize that the amount of cell-free DNA wastoo high or that inhibitors present in serum precludedefficient enzymatic DNA removal. Moreover, the authorsdid not perform any data analysis evaluating the pres-ence of DNA signal in their sequencing data, as the onesreported here. Importantly, we emphasize that our ob-servations do not undermine the potential utility ofSILVER-seq. Our letter aims to serve as a reminder ofthe current limitations of RNA-seqworkflows on biofluidsand as a plea for extensive quality control of RNA-seqdata in general.

Data Availability StatementThe code used for data analysis is available on GitHub athttps://github.com/jasperverwilt/SILVER-Seq_comment (5).

aDepartment of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium; bOncoRNALab, Cancer Research Institute Ghent, 9000 Ghent,Belgium; cDigestive Diseases Unit, Virgen del Rocio University Hospital, 41013 Seville, Spain; and dOncoDigest Group, Institute of Biomedicineof Seville (IBiS), 41013 Seville, SpainAuthor contributions: J. Verwilt, W.T., R.V.P., P.M., and J. Vandesompele designed research; J. Verwilt and R.V.P. performed research;J. Verwilt and R.V.P. analyzed data; and J. Verwilt, W.T., R.V.P., K.D.P., M.D.G., P.M., and J. Vandesompele wrote the paper.The authors declare no competing interest.Published under the PNAS license.1To whom correspondence may be addressed. Email: [email protected].

18934–18936 | PNAS | August 11, 2020 | vol. 117 | no. 32 www.pnas.org/cgi/doi/10.1073/pnas.2001675117

LETTER

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

021

Page 2: When DNA gets in the way: A cautionary note for DNA ...1 Z. Zhou et al., Extracellular RNA in a single droplet of human serum reflects physiologic and disease states. Proc. Natl. Acad

0.3541 0.34080.3051479574 reads 500564 reads426250 reads

0.0471

0.49440.4585

294533 reads

3051148 reads2820152 reads

plasma RNA seq data from (3) SILVER Seq serum

exonic intronic intergenic exonic intronic intergenic

0.2

0.4

0.6

map

pin

g r

atio

0.8220

0.1780

0.9978

0.0022

plasma RNA seq data from (3) SILVER Seq serum

nonsplice splice nonsplice splice0.00

0.25

0.50

0.75

1.00

1.25

map

pin

g r

atio

0.04140.0327

0.9259

0.4818

0.0205

0.4977

plasma RNA seq data from (3) SILVER Seq serum

failed same strand different strand failed same strand different strand0.0

0.3

0.6

0.9

map

pin

g r

atio

A

B

C

Fig. 1. Regional coverage, splice read fractions, and strandedness of the data. (A) Fractions of reads mapping to exonic, intronic, and intergenicregions. The average fractions and average number of reads are printed. The bottom and top dashed blue lines indicate the fraction of base pairsclassified as exonic (0.0427) and intronic/intergenic (0.479) in the genome. These numbers represent the fraction of reads mapping to exonic andintronic/intergenic regions, respectively, if they would originate from random locations in the genome. (B) Fraction of reads mapping to spliceand nonsplice regions. The average fractions are printed. (C) Strandedness of the data. Strandedness of the data (“same strand”) is expected tobe 1 for stranded data and 0.5 for unstranded data. The average fractions are printed.

Verwilt et al. PNAS | August 11, 2020 | vol. 117 | no. 32 | 18935

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

021

Page 3: When DNA gets in the way: A cautionary note for DNA ...1 Z. Zhou et al., Extracellular RNA in a single droplet of human serum reflects physiologic and disease states. Proc. Natl. Acad

1 Z. Zhou et al., Extracellular RNA in a single droplet of human serum reflects physiologic and disease states. Proc. Natl. Acad. Sci. U.S.A. 116, 19200–19208 (2019).2 E. Hulstaert et al., Charting extracellular transcriptomes in The Human Biofluid RNA Atlas. bioRxiv, 10.1101823369 (5 November 2019).3 C. Everaert et al., Performance assessment of total RNA sequencing of human biofluids and extracellular vesicles. Sci. Rep. 9, 17574 (2019).4 Z. Zhou, S. Zhong, Data from “Extracellular RNA in a single droplet of human serum reflects physiologic and disease states”. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131512. Accessed 8 January 2020.

5 J. Verwilt, R. Van Paemel, Code for “When DNA gets in the way: A cautionary note for DNA contamination in extracellular RNA-seq studies”. GitHub. https://github.com/jasperverwilt/SILVER-Seq_comment. Deposited 28 January 2020.

Fig. 2. Copy number profiles generated from SILVER-seq data. Yellow segments indicate a lower copy number, and green segments indicate ahigher copy number. (A) Copy number profile of a female breast cancer patient. (B) Copy number profile of a healthy male.

18936 | www.pnas.org/cgi/doi/10.1073/pnas.2001675117 Verwilt et al.

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

021