correction notice mrna-seq whole …...mrna-seq whole-transcriptome analysis of a single cell fuchou...
TRANSCRIPT
CORRECTION NOTICENat. Methods; doi:10.1038/nmeth.1315
mRNA-Seq whole-transcriptome analysis of a single cellFuchou Tang, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence Lee, Nanlan Xu, Xiaohui Wang, John Bodeau, Brian B Tuch, Asim Siddiqui, Kaiqin Lao & M Azim SuraniIn the version of this supplementary file originally posted online, Supplementary Figure 5a in was a duplicate of Supplementary Figure 5b.The error has been corrected in this file as of 19 April 2009.
nature methods | 1Nature Methods: doi:10.1038/nmeth.1315
nature | methods
mRNA-Seq whole-transcriptome analysis of a single cell
Fuchou Tang, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence Lee, Nanlan Xu,
Xiaohui Wang, John Bodeau, Brian B Tuch, Asim Siddiqui, Kaiqin Lao & M Azim Surani
Supplementary figures and text:
Supplementary Figure 1 The cDNA products amplified from single wild-type and Dicer1–/– oocytes.
Supplementary Figure 2 The pie charts of the number of the mRNA-Seq reads for single cells analyzed.
Supplementary Figure 3 The estimated instrument error rate of SOLiD system.
Supplementary Figure 4 Workflow of matching analysis for 50 bases reads.
Supplementary Figure 5 The reproducibility of SOLiD library preparation.
Supplementary Figure 6 The correlation plots of the fold changes that are determined by mRNA-Seq reads and TaqMan real-time PCR.
Supplementary Figure 7 Compared upregulated genes listed in Hannon’s paper in Dicer1–/– oocytes with our mRNA-Seq.
Supplementary Figure 8 Alignment score for mRNA-Seq reads.
Supplementary Figure 9 Base coverage of the single cell mRNA-Seq assay in a single mature oocyte.
Supplementary Figure 10 21,436 known transcripts are compared between two different wild-type oocytes using three normalization methods
Supplementary Figure 11 Visualization (UCSC genome browser) of chromosome 9 using the wiggle output file (showing details for three genes: Omt2a, Omt2b, and Ooep).
Supplementary Figure 12 Seven patterns that can be used for matching two sequences of size seven with at most two mismatches.
Note: Supplementary Tables 1–9 are available on the Nature Methods website.
Nature Methods: doi:10.1038/nmeth.1315
Primers dimers
0.5 – 3.0 kbcDNAs 1.0kb
0.5kb
Dicer1–/– oocytes
Buffer
3.0kb
Wild-type oocytes
Supplementary Figure 1. The cDNA products amplified from single wild-type and
Dicer1–/– oocytes.
The buffer lane is the negative control that omits the single cell but just picks buffer carryover
for the reverse transcription reaction.
Nature Methods: doi:10.1038/nmeth.1315
a
Junction Reads
3% Known RefSeq
19%
Unmatched32%
Genome 46%
Junction Reads
2% Known RefSeq
24%
Unmatched 34%
Genome 40%
b Blastomere Ago2–/–
Junction Reads
3% Known RefSeq
30% Unmatched
19%
Genome 48%
c Wt-oocyte #1
Junction Reads
2% Known RefSeq
30% Unmatched
23%
Genome 45%
d Wt-oocyte #2
Junction Reads
2% Known RefSeq
23% Unmatched30%
Genome 45%
e Dicer1–/– #1
Junction Reads
2% Known RefSeq
26% Unmatched30%
Genome 42%
f Dicer1–/– #2
Supplementary Figure 2. The pie charts of the number of the mRNA-Seq reads for single
cells analyzed.
Reads are mapped to the mouse genome (mm9, NCBI Build 37) as described in Alignment and
Algorithm section. We use UCSC annotation database (mm9) to determine if matching
locations of individual reads correspond to exon regions, or exon-exon junctions of known
transcripts. The number of these reads as a fraction of the total number of reads produced by
each run is represented in these pie charts. We generated 350 millions reads in total (Table 1).
We obtained about 66 - 81% of reads that mapped uniquely to Refseq, known junctions, and
genome. There are about 2 - 3 % reads that mapped uniquely to known exon junctions.
Nature Methods: doi:10.1038/nmeth.1315
0
1
2
3
4
5
6
7
8
9
10
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Position
Err
or r
ate
(%)
Color Corrected
Supplementary Figure 3. The estimated instrument error rate of SOLiD system.
Reads that aligned contiguously to the genome (as described in Alignment and Algorithm
section), on full length and in a unique place are used to estimate instrument error rate as a
function of position within the read (see Error detection section). For a given position, the
number of times we see a difference between the read call and corresponding matching
location, calculated as a fraction of all reads considered, is represented on the y-axis. Data
produced from all runs are aggregated and represented as average error rates, and 95%
confidence intervals of these estimates are inferred. The averaged error rate is about 0.5% for
the first 30 positions. For 50-mer reads, we only plotted the first 44 bases since the extension
step reaches full length of the read if reads have fewer errors on the last positions (and so the
error rates of the last positions for the subset of reads we used are under estimated).
Nature Methods: doi:10.1038/nmeth.1315
50 base reads file (csfasta, produced by SOLiD)
reference genome file (multi fasta, provided by user)
Map first N colors of read and extend
Map last N colors of read and extend
Merge and sort results
Max file (fasta format)
Counts file of annotated features
Visualization of coverage
(WIG file)
Read mapping report (GFF file, one line per read)
Supplementary Figure 4. Workflow of matching analysis for 50 bases reads.
Nature Methods: doi:10.1038/nmeth.1315
1
100
10000
1000000
1 100 10000 1000000
Wt-oocyte-Library#1
Wt-o
ocyt
e-Li
brar
y#2
R=0.996
a
b
1
100
10,000
1,000,000
1 100 10,000 1,000,000
Dicer-KO Library#1
Dic
er-K
O L
ibra
ry#2
R= 0.997
Supplementary Figure 5: The reproducibility of SOLiD library preparation.
Two independent libraries were prepared from the same cDNA samples of (a) a single
Wildtype oocyte and (b) a single Dicer1–/– oocyte. This confirms the accuracy of the sampling
of the single cell cDNAs and the reproducibility of the library construction.
R = 0.996
Wt-oocyte library #1
Wt-
oocy
te li
brar
y #2
R = 0.997
Dicer1–/– library #1
Dic
er1–/
– libr
ary
#2
Nature Methods: doi:10.1038/nmeth.1315
1
10
100
1,000
10,000
100,000
1,000,000
1 100 10,000 1,000,000
Wt-oocyte#1 (counts)
Wt-o
ocyt
e#2
(cou
nts)
R=0.997SOLiD
R = 0.926
0.01
0.1
1
10
100
0.01 0.1 1 10 100
FC(Dicer-KO/Wt,SOLiD)
FC(D
icer
-KO
/Wt,T
aqM
an)
R = 0.948
0.1
1
10
0.1 1 10
FC(Ago2-KO/Wt,SOLiD)
FC(A
go2-
KO
/Wt,T
aqM
an)
1
100
10,000
1,000,000
1 100 10,000 1,000,000
Dicer-KO#1 (counts)
Dic
er-K
O#2
(cou
nts)
R=0.998SOLiD
15
20
25
30
35
15 20 25 30 35
Dicer-KO#1 (Ct)
Dic
er-K
O#2
(Ct)
R=0.973TaqMan
15
20
25
30
35
15 20 25 30 35
Wt-oocyte#1 (Ct)
Wt-o
ocyt
e#2
(Ct)
R=0.892TaqMan
a b
c d
e f
Dicer1–/– #1 (counts)
Dic
er1–/
– #2
(cou
nts)
Wt-oocyte #1 (counts)
Wt-o
ocyt
e #2
(cou
nts)
Supplementary Figure 6: The correlation plots of the fold changes that are determined
by mRNA-Seq reads and TaqMan real-time PCR.
(a) Dicer1–/–/Wt-oocyte and (b) Ago2–/–/Wt-oocyte. Here, the top 100 most abundant genes
based on the Ct values of Wt-oocyte were plotted. All of the concordance Pearson
coefficients are > 0.99 for the positive and negative hits of wild-type, Dicer1–/–, and Ago2–/–
oocytes (Supplementary Table 1 online). The Ct values were normalized based on the
Hprt1 gene that has been verified to have similar expression level for 20 of single wild-type,
Dicer1–/–, and Ago2–/– mouse mature oocytes. The Pearson correlation significantly decreases
for these genes with Ct value > 33. The main reason was due to the sampling errors in the
TaqMan assays where 200 ng cDNAs were diluted by 800-fold in each reaction well for
duplication of 384 assays, while mRNA-Seq used all 200 ng cDNAs to make the libraries that
result in better detection sensitivity. The number of reads that match each gene annotated in
the UCSC database mm9 is used to estimate the fold change between samples. For TaqMan
measurements the “delta Ct” method was used to estimate (log2) fold change between
samples. The mRNA-Seq reads and Ct values are also highly correlated for Dicer1–/– #1 vs
Dicer1–/– #2 (c, d) and Wt-oocyte #1 vs Wt-oocyte #2 mature oocytes (e, f).
FC (Dicer1–/–/Wt, SOLiD)
FC (D
icer
1–/– /W
t, Ta
qMan
)
FC (Ago2–/–/Wt, SOLiD)
FC (A
go2–/
– /Wt,
TaqM
an)
Dicer1–/– #1 (Ct)
Dic
er1–/
– #2
(Ct)
Wt-oocyte #1 (Ct)
Wt-o
ocyt
e #2
(Ct)
R = 0.926 R = 0.948
R = 0.998SOLiD
R = 0.973 TaqMan
R = 0.997 SOLiD
R = 0.892 TaqMan
Nature Methods: doi:10.1038/nmeth.1315
0
2
4
6
8
10
12
14
16
Hsp
90ab
1
Hda
c1
Oog
4
Opt
n
Lcp
1
Kifc
1
Rnf
168
Ran
gap1
Exo
sc9
Slc2
5a26
Kct
d10
Prkd
1
Dyn
ll1
Ppp2
r2b
9030
611O
19R
ik
Tra
ip
Kif2
c
Kif4
Ppp4
r1
Fbxo
34
Prc1
1110
067D
22R
ik
FC(D
icer
-KO
/Wt)
1
10
100
1,000
Def
cr-r
s1A
Y761
184
0610
012H
03R
ikC
yp2c
29B
4gal
nt2
Ccd
c67
Mt1
Car
349
3050
0O09
Rik
Zfp4
73C
1300
26I2
1Rik
Mcf
2Va
v328
1005
5F11
Rik
Rep
s2Tm
em55
aLm
nb2
Npa
l1Pl
ch1
Pla1
aR
anga
p1G
alnt
13Pr
om1
Mrv
i124
1000
4L22
Rik
Opt
nH
dac1
Hps
1A
nkrd
36C
cl6
Glip
r1A
dam
ts12
Nrg
3C
yld
Uch
l4
FC(D
icer
-KO
/Wt)
ArraySOLiD
a
b
Supplementary Figure 7. Compared upregulated genes listed in Hannon’s paper in
Dicer1–/– oocytes with our mRNA-Seq.
(a) Fold change by our mRNA-Seq for the 22 upregulated genes controlled by endogenous
siRNA in Dicer1–/– oocytes determined by Affymetrix mouse microarray26, 20 of them (91%)
were confirmed by our mRNA-Seq assay. The 8 genes at the left side were also showed
upregulated by real-time PCR in Hannon’s paper26. (b) Fold change by Hannon’s Affymetrix
microarray and by our mRNA-Seq assay23. 33 of 36 genes (92%) were confirmed by our
mRNA-Seq assay.
FC (D
icer
1–/– /W
t-oo
cyte
) FC
(Dic
er1–/
– /Wt-
oocy
te)
Nature Methods: doi:10.1038/nmeth.1315
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
Alignment Score
Rel
ativ
e Fr
eque
ncy
All Read Alignments
Best Read Alignments
Unique Read Alignments
Supplementary Figure 8. Alignment score for mRNA-Seq reads.
For each read alignment we associate the score obtained by adding one for a color match and
subtracting one for a mismatch. This scoring function is used in the extension step (as
described in Alignment and Algorithm section) and the (contiguous) alignment producing the
highest score is reported. Cumulative distribution of aligned reads from an Ago2–/– oocyte run
is shown. Low scoring alignments are expected to be produced by reads aligning over splice
junctions or low quality reads, while high scoring reads are expected to represent exonic
regions.
Alignment score
Rel
ativ
e fr
eque
ncy
All read alignments
Best read alignments
Unique read alignments
Nature Methods: doi:10.1038/nmeth.1315
0
0.02
0.04
0.06
0.08
0.1
0.12
0 1000 2000 3000 4000 5000
Distance to 3' end of mRNA
Rel
ativ
e Fr
eque
ncy
size <1kbsize <2kbsize <3kbsize <4kbsize <5kbsize <6kbsize <7kbsize <8kbsize <9kbsize <10kb
Supplementary Figure 9. Base coverage of the single cell mRNA-Seq assay in a single
mature oocyte.
To obtain the coverage length distributions of our cDNAs, we binned all 21,436 transcripts
based on their sizes, bin n containing all transcripts of size less than n kb. Base coverage is
generated for each bin of transcripts and scaled to the total number of aligned reads. The
obtained distribution is represented as a function of the distance to the 3’ end of the transcripts.
The reads distribution for regions 3 kb away from the 3’end is very limited that agrees with
our gel results (in Supplementary Fig. 1).
Rel
ativ
e fr
eque
ncy
Nature Methods: doi:10.1038/nmeth.1315
1
10
100
1,000
10,000
100,000
1,000,000
1 100 10,000 1,000,000
Wt-oocyte#1 (reads)
Wt-o
ocyt
e#2
(rea
ds) R=0.958
1
10
100
1,000
10,000
100,000
1 10 100 1,000 10,000
Wt-oocyte#1 (RPKM)
Wt-o
ocyt
e#2
(RPK
M) R=0.97
1
10
100
1,000
10,000
100,000
1,000,000
1 100 10,000 1,000,000
Wt-oocyte#1 (RPKB)
Wt-o
ocyt
e#2
(RPK
B) R=0.97
1
10
100
1,000
10,000
100,000
1,000,000
1 100 10,000 1,000,000
Wt-oocyte#1 (quantile)
Wt-o
ocyt
e#2
(qua
ntile
) R=0.986
a b
c d
Supplementary Figure 10. 21,436 known transcripts are compared between two different
wild-type oocytes using three normalization methods.
(a) number of reads (without normalization), (b) number of reads per kilobase per million
reads6 (RPKM), (c) number of reads per kilobase (RPKB), and (d) quantile normalized counts.
The three normalization methods generate relatively similar Pearson correlation coefficients
(0.97, 0.97 and 0.986 respectively) while the quantile normalization generates better Pearson
correlation for our mRNA-Seq reads.
Wt-oocyte #1 (reads) Wt-oocyte #1 (RPKM)
Wt-
oocy
te #
2 (r
eads
)
Wt-
oocy
te #
2 (R
PKM
)
Wt-oocyte #1 (RPKB)
Wt-
oocy
te #
2 (R
PKB
)
Wt-oocyte #1 (quantile)
Wt-
oocy
te #
2 (q
uant
ile)
R = 0.958 R = 0.97
R = 0.97 R = 0.986
0
Nature Methods: doi:10.1038/nmeth.1315
negative strand
Chromosome 9
positive strand
200kb
Omt2a Omt2b
10kb
negative strand
positive strand
Ooep
negative strand
positive strand
Supplementary Figure 11. Visualization (UCSC genome browser) of chromosome 9 using
the wiggle output file (showing details for three genes: Omt2a, Omt2b, and Ooep).
Nature Methods: doi:10.1038/nmeth.1315
Supplementary Figure 12. Seven patterns that can be used for matching two sequences of
size seven with at most two mismatches.
These patterns are designed so that for any pair of integers (i,j), i,j ≤ 7, it corresponds to a
pattern that has zeros on positions i and j.
Nature Methods: doi:10.1038/nmeth.1315