correction notice mrna-seq whole …...mrna-seq whole-transcriptome analysis of a single cell fuchou...

CORRECTION NOTICENat. Methods; doi:10.1038/nmeth.1315

mRNA-Seq whole-transcriptome analysis of a single cellFuchou Tang, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence Lee, Nanlan Xu, Xiaohui Wang, John Bodeau, Brian B Tuch, Asim Siddiqui, Kaiqin Lao & M Azim SuraniIn the version of this supplementary file originally posted online, Supplementary Figure 5a in was a duplicate of Supplementary Figure 5b.The error has been corrected in this file as of 19 April 2009.

nature methods | 1Nature Methods: doi:10.1038/nmeth.1315

nature | methods

mRNA-Seq whole-transcriptome analysis of a single cell

Fuchou Tang, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence Lee, Nanlan Xu,

Xiaohui Wang, John Bodeau, Brian B Tuch, Asim Siddiqui, Kaiqin Lao & M Azim Surani

Supplementary figures and text:

Supplementary Figure 1 The cDNA products amplified from single wild-type and Dicer1–/– oocytes.

Supplementary Figure 2 The pie charts of the number of the mRNA-Seq reads for single cells analyzed.

Supplementary Figure 3 The estimated instrument error rate of SOLiD system.

Supplementary Figure 4 Workflow of matching analysis for 50 bases reads.

Supplementary Figure 5 The reproducibility of SOLiD library preparation.

Supplementary Figure 6 The correlation plots of the fold changes that are determined by mRNA-Seq reads and TaqMan real-time PCR.

Supplementary Figure 7 Compared upregulated genes listed in Hannon’s paper in Dicer1–/– oocytes with our mRNA-Seq.

Supplementary Figure 8 Alignment score for mRNA-Seq reads.

Supplementary Figure 9 Base coverage of the single cell mRNA-Seq assay in a single mature oocyte.

Supplementary Figure 10 21,436 known transcripts are compared between two different wild-type oocytes using three normalization methods

Supplementary Figure 11 Visualization (UCSC genome browser) of chromosome 9 using the wiggle output file (showing details for three genes: Omt2a, Omt2b, and Ooep).

Supplementary Figure 12 Seven patterns that can be used for matching two sequences of size seven with at most two mismatches.

Note: Supplementary Tables 1–9 are available on the Nature Methods website.

Nature Methods: doi:10.1038/nmeth.1315

Primers dimers

0.5 – 3.0 kbcDNAs 1.0kb

0.5kb

Dicer1–/– oocytes

Buffer

3.0kb

Wild-type oocytes

Supplementary Figure 1. The cDNA products amplified from single wild-type and

Dicer1–/– oocytes.

The buffer lane is the negative control that omits the single cell but just picks buffer carryover

for the reverse transcription reaction.


a

Junction Reads

3% Known RefSeq

19%

Unmatched32%

Genome 46%

Junction Reads

2% Known RefSeq

24%

Unmatched 34%

Genome 40%

b Blastomere Ago2–/–

Junction Reads

3% Known RefSeq

30% Unmatched

19%

Genome 48%

c Wt-oocyte #1

Junction Reads

2% Known RefSeq

30% Unmatched

23%

Genome 45%

d Wt-oocyte #2

Junction Reads

2% Known RefSeq

23% Unmatched30%

Genome 45%

e Dicer1–/– #1

Junction Reads

2% Known RefSeq

26% Unmatched30%

Genome 42%

f Dicer1–/– #2

Supplementary Figure 2. The pie charts of the number of the mRNA-Seq reads for single

cells analyzed.

Reads are mapped to the mouse genome (mm9, NCBI Build 37) as described in Alignment and

Algorithm section. We use UCSC annotation database (mm9) to determine if matching

locations of individual reads correspond to exon regions, or exon-exon junctions of known

transcripts. The number of these reads as a fraction of the total number of reads produced by

each run is represented in these pie charts. We generated 350 millions reads in total (Table 1).

We obtained about 66 - 81% of reads that mapped uniquely to Refseq, known junctions, and

genome. There are about 2 - 3 % reads that mapped uniquely to known exon junctions.


0

1

2

3

4

5

6

7

8

9

10

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43

Position

Err

or r

ate

(%)

Color Corrected

Supplementary Figure 3. The estimated instrument error rate of SOLiD system.

Reads that aligned contiguously to the genome (as described in Alignment and Algorithm

section), on full length and in a unique place are used to estimate instrument error rate as a

function of position within the read (see Error detection section). For a given position, the

number of times we see a difference between the read call and corresponding matching

location, calculated as a fraction of all reads considered, is represented on the y-axis. Data

produced from all runs are aggregated and represented as average error rates, and 95%

confidence intervals of these estimates are inferred. The averaged error rate is about 0.5% for

the first 30 positions. For 50-mer reads, we only plotted the first 44 bases since the extension

step reaches full length of the read if reads have fewer errors on the last positions (and so the

error rates of the last positions for the subset of reads we used are under estimated).


50 base reads file (csfasta, produced by SOLiD)

reference genome file (multi fasta, provided by user)

Map first N colors of read and extend

Map last N colors of read and extend

Merge and sort results

Max file (fasta format)

Counts file of annotated features

Visualization of coverage

(WIG file)

Read mapping report (GFF file, one line per read)

Supplementary Figure 4. Workflow of matching analysis for 50 bases reads.


1

100

10000

1000000

1 100 10000 1000000

Wt-oocyte-Library#1

Wt-o

ocyt

e-Li

brar

y#2

R=0.996

a

b

1

100

10,000

1,000,000

1 100 10,000 1,000,000

Dicer-KO Library#1

Dic

er-K

O L

ibra

ry#2

R= 0.997

Supplementary Figure 5: The reproducibility of SOLiD library preparation.

Two independent libraries were prepared from the same cDNA samples of (a) a single

Wildtype oocyte and (b) a single Dicer1–/– oocyte. This confirms the accuracy of the sampling

of the single cell cDNAs and the reproducibility of the library construction.

R = 0.996

Wt-oocyte library #1

Wt-

oocy

te li

brar

y #2

R = 0.997

Dicer1–/– library #1

Dic

er1–/

– libr

ary

#2


1

10

100

1,000

10,000

100,000

1,000,000

1 100 10,000 1,000,000

Wt-oocyte#1 (counts)

Wt-o

ocyt

e#2

(cou

nts)

R=0.997SOLiD

R = 0.926

0.01

0.1

1

10

100

0.01 0.1 1 10 100

FC(Dicer-KO/Wt,SOLiD)

FC(D

icer

-KO

/Wt,T

aqM

an)

R = 0.948

0.1

1

10

0.1 1 10

FC(Ago2-KO/Wt,SOLiD)

FC(A

go2-

KO

/Wt,T

aqM

an)

1

100

10,000

1,000,000

1 100 10,000 1,000,000

Dicer-KO#1 (counts)

Dic

er-K

O#2

(cou

nts)

R=0.998SOLiD

15

20

25

30

35

15 20 25 30 35

Dicer-KO#1 (Ct)

Dic

er-K

O#2

(Ct)

R=0.973TaqMan

15

20

25

30

35

15 20 25 30 35

Wt-oocyte#1 (Ct)

Wt-o

ocyt

e#2

(Ct)

R=0.892TaqMan

a b

c d

e f

Dicer1–/– #1 (counts)

Dic

er1–/

– #2

(cou

nts)

Wt-oocyte #1 (counts)

Wt-o

ocyt

e #2

(cou

nts)

Supplementary Figure 6: The correlation plots of the fold changes that are determined

by mRNA-Seq reads and TaqMan real-time PCR.

(a) Dicer1–/–/Wt-oocyte and (b) Ago2–/–/Wt-oocyte. Here, the top 100 most abundant genes

based on the Ct values of Wt-oocyte were plotted. All of the concordance Pearson

coefficients are > 0.99 for the positive and negative hits of wild-type, Dicer1–/–, and Ago2–/–

oocytes (Supplementary Table 1 online). The Ct values were normalized based on the

Hprt1 gene that has been verified to have similar expression level for 20 of single wild-type,

Dicer1–/–, and Ago2–/– mouse mature oocytes. The Pearson correlation significantly decreases

for these genes with Ct value > 33. The main reason was due to the sampling errors in the

TaqMan assays where 200 ng cDNAs were diluted by 800-fold in each reaction well for

duplication of 384 assays, while mRNA-Seq used all 200 ng cDNAs to make the libraries that

result in better detection sensitivity. The number of reads that match each gene annotated in

the UCSC database mm9 is used to estimate the fold change between samples. For TaqMan

measurements the “delta Ct” method was used to estimate (log2) fold change between

samples. The mRNA-Seq reads and Ct values are also highly correlated for Dicer1–/– #1 vs

Dicer1–/– #2 (c, d) and Wt-oocyte #1 vs Wt-oocyte #2 mature oocytes (e, f).

FC (Dicer1–/–/Wt, SOLiD)

FC (D

icer

1–/– /W

t, Ta

qMan

)

FC (Ago2–/–/Wt, SOLiD)

FC (A

go2–/

– /Wt,

TaqM

an)

Dicer1–/– #1 (Ct)

Dic

er1–/

– #2

(Ct)

Wt-oocyte #1 (Ct)

Wt-o

ocyt

e #2

(Ct)

R = 0.926 R = 0.948

R = 0.998SOLiD

R = 0.973 TaqMan

R = 0.997 SOLiD

R = 0.892 TaqMan


0

2

4

6

8

10

12

14

16

Hsp

90ab

1

Hda

c1

Oog

4

Opt

n

Lcp

1

Kifc

1

Rnf

168

Ran

gap1

Exo

sc9

Slc2

5a26

Kct

d10

Prkd

1

Dyn

ll1

Ppp2

r2b

9030

611O

19R

ik

Tra

ip

Kif2

c

Kif4

Ppp4

r1

Fbxo

34

Prc1

1110

067D

22R

ik

FC(D

icer

-KO

/Wt)

1

10

100

1,000

Def

cr-r

s1A

Y761

184

0610

012H

03R

ikC

yp2c

29B

4gal

nt2

Ccd

c67

Mt1

Car

349

3050

0O09

Rik

Zfp4

73C

1300

26I2

1Rik

Mcf

2Va

v328

1005

5F11

Rik

Rep

s2Tm

em55

aLm

nb2

Npa

l1Pl

ch1

Pla1

aR

anga

p1G

alnt

13Pr

om1

Mrv

i124

1000

4L22

Rik

Opt

nH

dac1

Hps

1A

nkrd

36C

cl6

Glip

r1A

dam

ts12

Nrg

3C

yld

Uch

l4

FC(D

icer

-KO

/Wt)

ArraySOLiD

a

b

Supplementary Figure 7. Compared upregulated genes listed in Hannon’s paper in

Dicer1–/– oocytes with our mRNA-Seq.

(a) Fold change by our mRNA-Seq for the 22 upregulated genes controlled by endogenous

siRNA in Dicer1–/– oocytes determined by Affymetrix mouse microarray26, 20 of them (91%)

were confirmed by our mRNA-Seq assay. The 8 genes at the left side were also showed

upregulated by real-time PCR in Hannon’s paper26. (b) Fold change by Hannon’s Affymetrix

microarray and by our mRNA-Seq assay23. 33 of 36 genes (92%) were confirmed by our

mRNA-Seq assay.

FC (D

icer

1–/– /W

t-oo

cyte

) FC

(Dic

er1–/

– /Wt-

oocy

te)


0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

Alignment Score

Rel

ativ

e Fr

eque

ncy

All Read Alignments

Best Read Alignments

Unique Read Alignments

Supplementary Figure 8. Alignment score for mRNA-Seq reads.

For each read alignment we associate the score obtained by adding one for a color match and

subtracting one for a mismatch. This scoring function is used in the extension step (as

described in Alignment and Algorithm section) and the (contiguous) alignment producing the

highest score is reported. Cumulative distribution of aligned reads from an Ago2–/– oocyte run

is shown. Low scoring alignments are expected to be produced by reads aligning over splice

junctions or low quality reads, while high scoring reads are expected to represent exonic

regions.

Alignment score

Rel

ativ

e fr

eque

ncy

All read alignments

Best read alignments

Unique read alignments


0

0.02

0.04

0.06

0.08

0.1

0.12

0 1000 2000 3000 4000 5000

Distance to 3' end of mRNA

Rel

ativ

e Fr

eque

ncy

size <1kbsize <2kbsize <3kbsize <4kbsize <5kbsize <6kbsize <7kbsize <8kbsize <9kbsize <10kb

Supplementary Figure 9. Base coverage of the single cell mRNA-Seq assay in a single

mature oocyte.

To obtain the coverage length distributions of our cDNAs, we binned all 21,436 transcripts

based on their sizes, bin n containing all transcripts of size less than n kb. Base coverage is

generated for each bin of transcripts and scaled to the total number of aligned reads. The

obtained distribution is represented as a function of the distance to the 3’ end of the transcripts.

The reads distribution for regions 3 kb away from the 3’end is very limited that agrees with

our gel results (in Supplementary Fig. 1).

Rel

ativ

e fr

eque

ncy


1

10

100

1,000

10,000

100,000

1,000,000

1 100 10,000 1,000,000

Wt-oocyte#1 (reads)

Wt-o

ocyt

e#2

(rea

ds) R=0.958

1

10

100

1,000

10,000

100,000

1 10 100 1,000 10,000

Wt-oocyte#1 (RPKM)

Wt-o

ocyt

e#2

(RPK

M) R=0.97

1

10

100

1,000

10,000

100,000

1,000,000

1 100 10,000 1,000,000

Wt-oocyte#1 (RPKB)

Wt-o

ocyt

e#2

(RPK

B) R=0.97

1

10

100

1,000

10,000

100,000

1,000,000

1 100 10,000 1,000,000

Wt-oocyte#1 (quantile)

Wt-o

ocyt

e#2

(qua

ntile

) R=0.986

a b

c d

Supplementary Figure 10. 21,436 known transcripts are compared between two different

wild-type oocytes using three normalization methods.

(a) number of reads (without normalization), (b) number of reads per kilobase per million

reads6 (RPKM), (c) number of reads per kilobase (RPKB), and (d) quantile normalized counts.

The three normalization methods generate relatively similar Pearson correlation coefficients

(0.97, 0.97 and 0.986 respectively) while the quantile normalization generates better Pearson

correlation for our mRNA-Seq reads.

Wt-oocyte #1 (reads) Wt-oocyte #1 (RPKM)

Wt-

oocy

te #

2 (r

eads

)

Wt-

oocy

te #

2 (R

PKM

)

Wt-oocyte #1 (RPKB)

Wt-

oocy

te #

2 (R

PKB

)

Wt-oocyte #1 (quantile)

Wt-

oocy

te #

2 (q

uant

ile)

R = 0.958 R = 0.97

R = 0.97 R = 0.986

0


negative strand

Chromosome 9

positive strand

200kb

Omt2a Omt2b

10kb

negative strand

positive strand

Ooep

negative strand

positive strand

Supplementary Figure 11. Visualization (UCSC genome browser) of chromosome 9 using

the wiggle output file (showing details for three genes: Omt2a, Omt2b, and Ooep).


Supplementary Figure 12. Seven patterns that can be used for matching two sequences of

size seven with at most two mismatches.

These patterns are designed so that for any pair of integers (i,j), i,j ≤ 7, it corresponds to a

pattern that has zeros on positions i and j.


correction notice mrna-seq whole …...mrna-seq whole-transcriptome analysis of a single cell fuchou...

Documents