detecting and ameliorating systematic variation from large ... · detecting and ameliorating...

26
Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li 1,2,* , Paweł P. Łabaj 3,* , Paul Zumbo 1,2* , Peter Sykacek 3 , Wei Shi 5 , Leming Shi 6 , John Phan 7 , Leo Wu 7 , May Wang 7 , Charles Wang 8 , Danielle Thierry-Mieg 9 , Jean Thierry-Mieg 9 , David P. Kreil 3,4,x , Christopher E. Mason 1,2,x 1 Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065 USA 2 The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10065 USA 3 Chair of Bioinformatics, Boku University, Vienna, Austria, Europe 4 University of Warwick, U.K. 5 Department of Bioinformatics, WEHI, Australia 6 Center for Pharmacogenomics, School of Pharmacy, Fudan University, Shanghai, China 7 Department of Biomedical Engineering, GeorgiaTech and Emory University, Atlanta, GA USA 8 Center for Genomics and Division of Microbiology & Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, CA 92350. 9 National Center for Biotechnology Information (NCBI), Bethesda, MD, USA 1. Multi-pipeline agreement of site outliers. 2. Boxplots of the inter-site false positive DEGs. 3. Inter-site 3’UTR genes false positive DEGs count. 4. Intra-site and inter-site correlations. 5. Q-Q plots for gene expression inter-site repeatability (samples A & B). 6. Q-Q plots for gene expression inter-site repeatability (samples C & D). 7. Q-Q plots for DEG inter-site repeatability. 8. MDS plot of normalized gene expression. 9. Intra-site DEGs detected from 6 sites. 10. Evaluation on inter-site DEGs reproducibility. 11. Correlation of RNA-seq normalized gene expression with TaqMan assays. 12. Taqman genes highly expressed in the RNA-seq data. 13. Evaluation of the performance of intra-site DEGs using TaqMan data. 14. Illustration of measures about sample identification. 15. Spearman correlation of the adjusted p-value between inter-site DEGs and intra-site DEGs. 16. Inter-site/intra-site DEG validation. 17. Site-specific base content examination of an independent control library for assessing site- variance. 18. Intra- and inter-site variations of three additional quality metrics for the Illumina dataset. 19. Duplication rate per library. 20. GC content quality metric and latent variables from PGM data. 21. Examination of quality metrics and DEG detection for PGM data. 22. PGM Inter-site and intra-site DEGs analysis. Nature Biotechnology: doi:10.1038/nbt.3000

Upload: others

Post on 03-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*, Paul Zumbo1,2*, Peter Sykacek3, Wei Shi5, Leming Shi6, John Phan7, Leo Wu7, May Wang7, Charles Wang8, Danielle Thierry-Mieg9, Jean Thierry-Mieg9, David P. Kreil3,4,x, Christopher E. Mason1,2,x 1Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065 USA 2The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10065 USA 3Chair of Bioinformatics, Boku University, Vienna, Austria, Europe 4University of Warwick, U.K. 5Department of Bioinformatics, WEHI, Australia 6Center for Pharmacogenomics, School of Pharmacy, Fudan University, Shanghai, China 7Department of Biomedical Engineering, GeorgiaTech and Emory University, Atlanta, GA USA 8Center for Genomics and Division of Microbiology & Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, CA 92350. 9National Center for Biotechnology Information (NCBI), Bethesda, MD, USA 1. Multi-pipeline agreement of site outliers. 2. Boxplots of the inter-site false positive DEGs. 3. Inter-site 3’UTR genes false positive DEGs count. 4. Intra-site and inter-site correlations. 5. Q-Q plots for gene expression inter-site repeatability (samples A & B). 6. Q-Q plots for gene expression inter-site repeatability (samples C & D). 7. Q-Q plots for DEG inter-site repeatability. 8. MDS plot of normalized gene expression. 9. Intra-site DEGs detected from 6 sites. 10. Evaluation on inter-site DEGs reproducibility. 11. Correlation of RNA-seq normalized gene expression with TaqMan assays. 12. Taqman genes highly expressed in the RNA-seq data. 13. Evaluation of the performance of intra-site DEGs using TaqMan data. 14. Illustration of measures about sample identification. 15. Spearman correlation of the adjusted p-value between inter-site DEGs and intra-site DEGs. 16. Inter-site/intra-site DEG validation. 17. Site-specific base content examination of an independent control library for assessing site-

variance. 18. Intra- and inter-site variations of three additional quality metrics for the Illumina dataset. 19. Duplication rate per library. 20. GC content quality metric and latent variables from PGM data. 21. Examination of quality metrics and DEG detection for PGM data. 22. PGM Inter-site and intra-site DEGs analysis.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 2: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

A" B"

C"

a b

c

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

ILM2-ILM3

ILM2-ILM5

ILM3-ILM5

Supplementary Figure 1. Multi-pipeline analysis of false positive rates across sites. We examined the numbers of false-positives (A vs A) in DEG detection for three different pipelines and consistently observed a higher number of false positive DEGs for one of the sites (ILM3) (a) DEGs found for WHAM with Cufflinks. (b) DEGs found for MapSplice with HTSeq (c) DEGs found for Novoalign with HTSeq. Only RefSeq genes were considered in this analysis. Given higher stringencies (FC=3, FDR=0.005), a reduction in false positives could be obtained, but ILM3 still showed false positives in all cases.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 3: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

FC: 1.5FDR: 0.05

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.001

FC: 2FDR: 0.05

FC: 2FDR: 0.01

FC: 2FDR: 0.001

●●

● ●

●●

● ●●

●●● ●

●●

●●●● ●●

●●●

●●

●●

●●

●●

●●

●● ●●

● ●

●●●●

●●

●●

●●

●●●

● ●●●

● ●●

● ●●●

●●

●●

● ●●

●●

●●●

●●

● ●●●

● ●●

● ●●●

●●

●●

●●●

● ●●

●●●●

●● ●●

●●●●

●●●● ●●

●●●

●●●

● ●

●●● ● ●● ●●● ●●●● ●●

●●●

● ●●

●●●

●●● ●● ●● ●● ● ●●●●●●●●

● ●●

●●●

● ● ● ● ●●●

●●●●●● ●●

● ● ●

●●●

●●●

●●

●●● ● ●●● ●● ●●●

●●● ●●●

●●

●●●

●●●

●●

●● ● ●● ●●●● ●●●●● ● ●●●

●●

● ●●

●●●

● ● ● ●●●●●● ● ●●● ●● ●● ●

●●

● ●●

● ●●

●●● ● ●●●● ●●●●

●●● ●● ●

●●●

● ●●

●●●

●●●

●●● ●

● ●●●

●●●●●●

●● ●

●●●

●●

●●●

●●

●●● ●●

●●● ●●

●●●

●● ●

●● ●

●●

●● ●● ●●

●●●●

● ●●●

●●●●

●●●

●●●

●●● ●

●●●●●

●●●

●●●●●●

●●

●●●

● ●●

●●● ●●● ●● ●● ●●

●● ●●● ●

● ●●

●●●

●●●

●●● ●● ●●●●●● ●●●● ●●●

●● ●

●●●

●●● ● ●●●● ●● ●●●●● ● ●●

●●

●●●

●●●

●●● ●● ●●●●●●●●●● ● ●●

● ●●

● ●●

●●

●●● ● ●●● ● ●● ●● ●●● ●●●

●● ●

●● ●

●●●

●●● ●● ●●●● ●●● ●● ●●●●

●●

●●●

● ●●

●●

● ●● ● ●●● ●●●● ●●● ●● ●●

●●

● ●●

● ●●

●●● ●●●● ●●●●● ●●● ●● ●

0250050007500

10000

0250050007500

10000

0250050007500

10000

0250050007500

10000

comp: A

comp: B

comp: C

comp: D

Analysis

Fals

e po

sitiv

e D

EGs

Without SVA With SVA

Supplementary Figure 2. Boxplots of the number of inter-site false positive DEGs without or with SVA clean up, using 3 different false discovery rate (FDR) cutoffs (0.05, 0.01, 0.001) and 2 different fold-change (FC) cutoffs (1.5, 2). All comparisons are for four replicate libraries of the same sample (A,B,C,D) at six different sites.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 4: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

comp: A comp: B comp: C comp: D

01000200030004000

01000200030004000

01000200030004000

01000200030004000

01000200030004000

01000200030004000

FC: 1.5FC: 1.5

FC: 1.5FC: 2

FC: 2FC: 2

FDR: 0.05FDR: 0.01

FDR: 0.001FDR: 0.05

FDR: 0.01FDR: 0.001

AGR−BG

IAG

R−CNL

AGR−CO

HAG

R−M

AYAG

R−NVS

BGI−CNL

BGI−CO

HBG

I−MAY

BGI−NVS

CNL−COH

CNL−MAY

CNL−NVSCO

H−M

AYCO

H−NVS

MAY−NVS

AGR−BG

IAG

R−CNL

AGR−CO

HAG

R−M

AYAG

R−NVS

BGI−CNL

BGI−CO

HBG

I−MAY

BGI−NVS

CNL−COH

CNL−MAY

CNL−NVSCO

H−M

AYCO

H−NVS

MAY−NVS

AGR−BG

IAG

R−CNL

AGR−CO

HAG

R−M

AYAG

R−NVS

BGI−CNL

BGI−CO

HBG

I−MAY

BGI−NVS

CNL−COH

CNL−MAY

CNL−NVSCO

H−M

AYCO

H−NVS

MAY−NVS

AGR−BG

IAG

R−CNL

AGR−CO

HAG

R−M

AYAG

R−NVS

BGI−CNL

BGI−CO

HBG

I−MAY

BGI−NVS

CNL−COH

CNL−MAY

CNL−NVSCO

H−M

AYCO

H−NVS

MAY−NVS

Comparison

Num

ber o

f DEG

stype up down

3' UTR gene read count

Supplementary Figure 3. Inter-site false positive DEG counts for gene expression quantified from 3’ UTR reads. Reads were filtered by match to the 3’ UTR of Aceview genes. See Supp. Figure 8 for definitions. Site #3 still shows a high false positive DEG rate.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 5: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

intra−site inter−site0.

979

0.96

8

0.97

0

0.97

6

0.96

9

0.97

4

0.98

1

0.96

9

0.97

3

0.97

6

0.97

1

0.97

6

0.98

2

0.96

8

0.97

7

0.97

5

0.96

7

0.97

8

0.98

1

0.96

7

0.97

3

0.97

5

0.96

8

0.97

8

0.97

20.

960

0.97

50.

973

0.97

60.

955

0.97

10.

968

0.96

90.

957

0.95

80.

958

0.97

10.

973

0.97

0

0.97

30.

969

0.97

60.

975

0.97

80.

962

0.97

20.

969

0.97

10.

965

0.96

50.

967

0.97

20.

974

0.97

3

0.97

40.

965

0.97

70.

974

0.98

00.

958

0.97

10.

966

0.97

10.

961

0.96

00.

964

0.96

90.

975

0.97

2

0.97

30.

965

0.97

60.

974

0.97

90.

957

0.97

00.

966

0.97

10.

959

0.96

00.

963

0.96

90.

975

0.97

2

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

AB

CD

ILM

1

ILM

2

ILM

3

ILM

4

ILM

5

ILM

6

ILM

1−IL

M2

ILM

1−IL

M3

ILM

1−IL

M4

ILM

1−IL

M5

ILM

1−IL

M6

ILM

2−IL

M3

ILM

2−IL

M4

ILM

2−IL

M5

ILM

2−IL

M6

ILM

3−IL

M4

ILM

3−IL

M5

ILM

3−IL

M6

ILM

4−IL

M5

ILM

4−IL

M6

ILM

5−IL

M6

Site

Cor

rela

tion

coef

ficie

nts

Supplementary 4. Intra-site and inter-site correlations (Pearson Correlation, y-axis) of normalized gene expression for samples A-D across all sites (x-axis).

Nature Biotechnology: doi:10.1038/nbt.3000

Page 6: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

a

b

Supplementary Figure 5. Q-Q plot for gene expression inter-site repeatability for sample A (a) and B (b). For each sample, we compared the normalized gene expression between two test sites, among all 6 SEQC test sites.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 7: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

a

b

Supplementary Figure 6. Q-Q plot for gene expression inter-site repeatability for sample C (a) and D (b). For each sample, we compared the normalized gene expression between two test sites, among all 6 SEQC test sites.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 8: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

Supplementary Figure 7. Q-Q plots for DEG inter-site repeatability for sample A vs. B at ILM1 (a) A vs. C at ILM 1 (b), A vs. A between ILM1 and ILM2 (c) and another A vs. A between ILM1 and ILM6 (d). For each sample, we compared the gene expression DEGs between two test sites, among all 6 SEQC test sites, and we can see that even the same sites show a drift in their p-value distribution, indicating false positives.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 9: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

−2 −1 0 1 2

−0.5

0.0

0.5

Dimension 1

Dim

ensi

on 2

AAAAAAAAA

AAAAAAAAAAAAA

A

AAAA BBB

BBBBB

B

BBBBBBBBB

BBBB

B

BBBB

CCCCCCCCCCCCCCCCCCCCCCCCCCC

DDDDDDDDDDDDDDDDDDDDDDDDDDD

ILM1ILM2ILM3ILM4ILM5ILM6

Supplementary Figure 8. MDS plot of normalized gene expression. All replicates clustered by sample type (A-D). Site information is coded by color (see legend).

Nature Biotechnology: doi:10.1038/nbt.3000

Page 10: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

FC: 1.5FDR: 0.05

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.001

FC: 2FDR: 0.05

FC: 2FDR: 0.01

FC: 2FDR: 0.001

0.00

0.25

0.50

0.75

1.00

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

ComparisonPe

rcen

tage

of i

nter

sect

ing

DEG

s

1 2 3 4 5 6site.num

FC: 1.5FDR: 0.05

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.001

FC: 2FDR: 0.05

FC: 2FDR: 0.01

FC: 2FDR: 0.001

0.00

0.25

0.50

0.75

1.00

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

Comparison

Perc

enta

ge o

f int

erse

ctin

g D

EGs

1 2 3 4 5 6site.num

FC: 1.5FDR: 0.05

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.001

FC: 2FDR: 0.05

FC: 2FDR: 0.01

FC: 2FDR: 0.001

0.00

0.25

0.50

0.75

1.00

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

A−BA−CA−DB−CB−DC−D

Comparison

Perc

enta

ge o

f int

erse

ctin

g D

EGs

1 2 3 4 5 6site.num

FDR: 0.05FC: 1.5

FDR: 0.05FC: 2

FDR: 0.01FC: 1.5

FDR: 0.01FC: 2

FDR: 0.001FC: 1.5

FDR: 0.001FC: 2

●●

●●

●●

● ●●●●●

●●●

●●

●● ●● ●

●●

●●● ●

●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●●●●

● ● ● ●●●

●●

●● ●●●● ●●●● ● ●●●●●

●●●●

●● ●

●●

●●

●●●●

●●● ● ●

●●●●

●●● ●●●●● ●●● ●●●

●●●●

●● ● ●

●●

●●

●●● ●

●● ●

●● ●

●●

●●●●

●● ● ●●●●●●

●●●

●● ●

●●

●●

●●

●● ●

●●●

●●

●● ●

●●

● ●●

●● ●

●●●●

●● ●

●●●●

●●

● ●●

●● ●

●●●●●

● ●●●●●●● ●●●

●● ●

●●

●●●

● ●●

● ●●

●●●●

●●

● ●● ●●

●●

●● ●●●●●● ●● ●● ●●●

●●●●● ● ●●

●●●●

●●●

●● ●

●●

●●

●●● ●● ● ● ● ●●● ●●

● ●●

● ●

●●●

●●●

●● ●

●●●

●●

●● ●

●●

● ●●

●●●

●●● ●

●● ●

●● ●●

●●●

●● ●

●●●●

● ●● ●● ● ●● ●● ●●● ●

●●●●

●● ●

●●●●

●●● ●

●●

● ●●●

●●

●● ●●● ●● ● ● ● ●●●●●

● ●● ●●● ●●

●●● ●

100002000030000

100002000030000

100002000030000

100002000030000

100002000030000

100002000030000

comp: A−B

comp: A−C

comp: A−D

comp: B−C

comp: B−D

comp: C

−D

Analysis

Num

ber o

f DEG

s

Without SVA With SVA

a b

c d

FC: 1.5FDR: 0.05

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.001

FC: 2FDR: 0.05

FC: 2FDR: 0.01

FC: 2FDR: 0.001

●●

●●

●●

● ●●●●

●●

●●●●

●●

● ●●●●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

●●●●●

●●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●● ●

●●●

●●

●●● ●

●● ● ● ●● ●

●●

●●●

●● ●

●● ●

●●

●●

●● ●

●●●

●●

●●●

●●

● ●●

●●●

●●●

●●

● ●●

●●●

●●●

●● ●

●●●

●●

●●● ●●● ● ● ●●●●

●●●

● ●

●●●

●●●

●● ●

●●

●●

●●●

●●

● ●●

●●●

●● ●●●

● ●●

● ●●

●● ●●●

● ● ●●● ●●●

●● ●●● ●● ●●● ●●● ●●●

●●● ●

●● ●

●●

●●

●●●●

●●● ● ●●●

●●

●●●● ●●●● ●●● ●●

● ●●●

● ● ●●

●●

●●

●●

●●●

●● ●

●● ●●

● ●●● ●●●● ● ●●

●● ●

●●

●●●

● ●●●●●●

●●●●

●●● ●● ●

●●

●● ●●● ●●● ●● ●● ●● ●●

● ●●●● ● ●●

● ●● ●

●● ●

●● ●

●●●

●●

● ●●●●●●● ●● ● ●● ●

●●

●●●

● ●●

●●

●●

●●● ●

●●

● ●●●

●●

●● ●●● ●●●● ●● ●● ●●

●●● ●● ● ●●

● ●●●

100002000030000

100002000030000

100002000030000

100002000030000

100002000030000

100002000030000

comp: A−B

comp: A−C

comp: A−D

comp: B−C

comp: B−D

comp: C

−D

Analysis

Num

ber o

f DEG

s

Without SVA With SVA

a

Supplementary Figure 9. Intra-site DEGs detected from 6 sites without or with SVA analysis, with 3 FDR cutoff and 2 FC cutoff. FDR and FC are labeled on top of each grid.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 11: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

60%

70%

80%

A−B

A−C

A−D

B−C

B−D

C−D

sample

Shar

ed D

EGs

perc

enta

gebe

twee

n tw

o si

tes

0.92

0.94

0.96

A−B

A−C

A−D

B−C

B−D

C−D

comparisonSp

earm

an C

orre

latio

n of

Inte

r−si

teco

mm

on D

EG a

djus

ted

p−va

lues

SEQCA/B/C/D

ILM1Norm/DEGs

ILM2Norm/DEGs

ILM3Norm/DEGs

ILM4 Norm/DEGs

ILM5 Norm/DEGs

ILM6 Norm/DEGs

5 10 15 20 25

510

1520

2530

A vs. C (ILM4) adjusted p−value (−log10)

A vs

. C (I

LM6)

adj

uste

d p−

valu

e (−

log1

0)

ILM4 ILM6

A VS C

1159 12034424

0.82

0.84

0.86

0.88

0.90

A −

B

A −

C

A −

D

B −

C

B −

D

C −

D

comparisonM

atth

ews

corre

latio

n co

effic

ient

(Taq

Man

val

idat

ion)

a b

e f

c

d

Supplemental Figure 10. Reproducibility of RNA-seq data DEG analysis across 6 test sites. (a) Schematic plot of RNA-seq data normalization by sample and followed by DEGs analysis, at each site independently. (b) Spearman correlation of adjusted p-values of the common DEGs among 6 test sites. (c) Scatter plot of –log10 adjusted p-values of common DEGs comparing sites ILM4 and ILM6, for A vs. C. (d) The percentage of DEGs shared between two sites. (e) Venn diagram of the number of DEGs shared between site ILM4 and site ILM6. (f) Matthews correlation coefficient (MCC) for RNA-Seq DEG detection performance, as benchmarked by TaqMan. Thresholds for DEG calls: FDR: 0.05, FC: 2.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 12: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

Supplementary Figure 11. Taqman genes highly expressed in the RNA-seq data. Most of the TaqMan genes are those expressed around top 87% of genes, but we do also see some lowly expressed genes covered.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 13: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

AGR BGI CNL COH MAY NVS

r2 = 0.744

r2 = 0.776

r2 = 0.708

r2 = 0.698

r2 = 0.734

r2 = 0.776

r2 = 0.703

r2 = 0.694

r2 = 0.732

r2 = 0.777

r2 = 0.7

r2 = 0.687

r2 = 0.74

r2 = 0.777

r2 = 0.708

r2 = 0.697

r2 = 0.74

r2 = 0.778

r2 = 0.703

r2 = 0.696

r2 = 0.743

r2 = 0.78

r2 = 0.709

r2 = 0.696

−10

−5

0

5

10

−10

−5

0

5

10

−10

−5

0

5

10

−10

−5

0

5

10

AB

CD

−15 −10 −5 0 5 −15 −10 −5 0 5 −15 −10 −5 0 5 −15 −10 −5 0 5 −15 −10 −5 0 5 −15 −10 −5 0 5TaqMan

RNA−seq

AGR BGI CNL COH MAY NVS

●●

●●

●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●●●

●●

●●

● ●

● ●●

●●

●● ●●●

●●

●● ●

●●

●●● ●● ●

●●

●●

●●

●●●

●●●●

● ●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●●●●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●● ●●

●●

●●●

●●

●●●

●●

●●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●● ●

● ●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

● ●

●●●

●●●

●●●

●●

●●

●●

●●

●● ●●

●●●

●●●

● ●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

● ●●●

●●

●●

●● ●

● ●

●●●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●

●●

● ●●

●●●

●●● ●

●●

● ●

●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

● r2 = 0.764

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●● ●●

●● ●●

● ●●●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●● ●●

●●

●● ●

● ●●

●●●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●●

●●● ●

●●●

●● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●● ●

●● ●

●●●●●

●● ●

●●

●●

●●

●● ●

●●

● ●●

●●

●●●

●●

●●●

●●

●●●●

●●●●

●●●●●

●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

● r2 = 0.798

● ●

●●

● ●

● ●

●●

●●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

● ●

●●●●

●●

●●●●

●●

●● ●

●●

● ●●

●●● ●●●

●●

●●

●● ●

●●

●●●●● ●

●●

● ●

●●

●●●

●●●●

●●

● ●●

●●

●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●● ●●

●●

●●

●●●●

●● ●●

●●

●●

●●●

●●●

●● ●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●

●● ●

●●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●● ●●●

●●●

●●

●● ●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

● ●

●●●●●

●●●

●●

●●● ●●●

● ●●

● ●

●●

●●●

● ●●●

●●●

●●●●●● ●

●●

●●

● ●

●●

●●

●●●

●●

●●●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●● ●●●●

●●

●●●

●●

●●

●●●●●

●●

● ●●

●●

● ●

●●●

●●

●● ●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●●

● ●●

●●

●●●

●●●

●●

●●

●●●

●● ●

● r2 = 0.757

●●

●●●

●●

●●

● ●

●●

●●●

● ●●

●●

●●

●●

●●

●●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●●

● ●●●

●●

●●●●

● ● ●●

● ● ●●

●●● ●●

●●

●●

●●

●● ●

●●

●●●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●● ●

●●●●●●

●●●

●●

●●

●●

●●●

●● ●

● ●

●●

●● ●●

●●

●●

●●●

● ●●

●●

●●

●●

●●●●

● ●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●●●●●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●

● ●●

●●

●●

● ●

●●●

●●●●●●

●●●●●

● ●

●●

●●●●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●● ●

●●

●●

●●●

●●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

● ●●●

●● ●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●●●●

●● ●

●●

●●

●●●

●● ●●

● ●● ●

●●

●●●

●●

●●●●●

●●

●●●

●●●●●

●●●●●

●● ●

●●●

●●

●●●●●●

●●

● ●●●

●●

●●

●●●

●●

●●

●●●

●●●

● r2 = 0.756

●●

●●

●●

●●

● ●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●●●

●●●

●●

●●

● ●

● ●●

●●

●● ●●●

●●

●● ●

●●

●●● ●● ●

●●

●●

●●●

●●●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

● ●●

●●●

●●●●

●● ●

●●

●●●

●●

●●●

●●

●●

● ●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●● ●

● ●●

●●

●●●

●●

●●

●●●

●●●

●●

●●● ●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●●

●●●

●●

●●

●●

●● ●●

●●●●

●●

● ●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

● ●

●●

●●●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●

● ●●

●●●

●●● ●

●●

●●

● ●

●●

●●●

●●

● ●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.748

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●

●● ●●●

●●●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●● ●●

●● ●●

●●●●

●●

●● ●

●●

● ●●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●●●

●●

●● ●

● ●●

●●●●

●●

● ●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●●●

●●● ●

●●

●● ●

●●

●●●

●●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●● ●

●● ●

●●●●●

●● ●

●●

●●

●●

●● ●

●●

● ●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●● ●

● r2 = 0.789

●●

●●

●●

● ●

●●●

●●● ●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

● ●

●●

●●

● ●

●●●●

●●

●●●●

●●

●● ●

●●

● ●●

●●● ●●●

●●

●●

●● ●

●●

●●●●● ●

●●

● ●

●●●

●●

●●●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●● ●●

●●●

●●

●●●●

●● ●

●●

●●

●●●

●●●

●● ●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●

●● ●

●●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●● ●●●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●● ●●●

●●●

●●

● ●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●● ●

●●

●●

●●

●●

●●●

●● ●

●●●●●

●●●

●●

●●● ●●●

●●

● ●●

● ●

●●

●●

●●●

● ●●●

●●●

●●●●●● ●

●●

●●

● ●

●●

●●

●●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●●●●

●●

●●

●●

●●●●●

●●

● ●●

●●

● ●

●●●

●●

●● ●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●

●●

●●

● ●●

●●●

●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.745

●●

●●

●●

●●

●●

●●●

●●●

● ●●●

●●

●●

●●

●●●●●●●

●●

●●●●●●

●●

●●

● ●

●●

●●●

● ●●●

●●●

●●●

● ● ●●

● ● ●●

●●● ●●

●●

●●

●●

●● ●

●●

●●●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●● ●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●●

●● ●

● ●

●●

●● ●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

● ●●●

●●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

● ●

●●●●●●●●●

●●●●●

●●

●●

●●

●●●●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

● ●

●●

●● ●

●●

●●●

●●●

● ●

●●

●●

●●●

●●●

●●

●●

●●●●

●● ●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●

●●

●●

●● ●

●●●

●●

●●● ●

●●●●●

●● ●

●●

●●

●●●●

●● ●●

● ●● ●

●●

●●●

●●

● ●●●●●●

●●●

●●●●●

●●●●●

●● ●

●●●

●●

●●●●●

●● ●●

●●

●●

●●●

●●

●●●

●●●

●●●

● r2 = 0.743

●●

●●

●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●● ●●●

●●

●●

●● ●

●●

●●●

●● ●

●●

●●

●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●

●●

●●

● ●●

●●●

●●●●

●●●●

●●

●●

●●

● ●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●● ●

● ●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

● ●

●●●

●●

●●●●●●

●●

●●

● ●●

● ●

●●

●●

●●

● ●●●

● ●●

●●

●●● ●

● ●

●●

● ●

●●

●●

●●●

● ●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.723

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●

●● ●●●

●●●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●

●● ●●

●●

● ●

●●●●●

●●

●●

● ●●●●

● ●

●●

● ●

●●

●●

●●

●●

●●●

●●

●● ●

●●●

●●

●●●

●●●

●●●

● ●

● ●●●

●●

●● ●

● ●●

●●●●

●●●

● ●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

● ●

●●●

● ●●

●●

●●

● ●●

●●

●●●

●●

● ●●

●●●●

● ●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●●●●

●●

●● ●

●●

●●

●●●●

●●● ●

● ●●

●● ●

●●

●●

●●●

●●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●● ●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●

● r2 = 0.783

●●

●●

● ●

● ●

●●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

● ●

●●●●

●●●

●●●

●●

● ●●

● ●●●

●●● ●●●

●●

●●●

●● ●

●●

●●●

●● ●

●●

●●●

●●

●●●

●●

● ●●

●●●

●●

● ●●●

●●

●●●

●●●

●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●● ●●

●●●

●●

●●●●

●● ●●

●●

●●

●●●

●●●

●● ●●

●●●

●●●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●● ●

●●●

●●

●●

●●●

●● ●

●●

●●●

●●

● ●

●●●

●●●●

●●

●●●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●●●●

●●●

●●

●●● ●●●

●●

● ●●

● ●

●●

●●

●●●

●●●●

●●●

●●●●●● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●●●

●●

● ●●

●●

● ●

●●●

●●

●● ●●

●●

●●

●●●

●●●●

●●●

●●●

●●

●●●●●

●●

● ●●

●●

●●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.737

● ●

●●

●●

●●

●●

●●

●●●

● ●● ●

●●

●●●

●●

●●

●●●●●●

● ●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●● ●●

●●

●●

●●● ●●●●

●●

●●

●●

●●●

●● ●

●●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●● ●

●●●

●●●

● ●●●

●●●

●● ●

●●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●

● ●

●●

●●●

●● ●●

●●

●●

●●●

● ●●

●●

●●

● ●●●

● ●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●●●●

●●●

●●

●●●

●●●

●●

●●●

●●

● ●●

●●

●●

●●

●●●

●●●●●●

●●●

●●

●●

●●

●●●●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●● ●

●●

●●

●●

●●

●●●●

●●

● ●

●●

●● ●

●●

●●

●●●

●●●

● ●●●

●●●

●●●

●●●

●●

●●

●●

● ●●●

●● ●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●● ●

●●

●●●

●●●

●● ●●

● ●● ●

●●

●●●

●●

●●●●● ●●

●●●

●●●●●

●●●●●●● ●

●●●

●●

●●●●●●

●●

● ●●● ●●●●

●●●

●●

●●●

●●●

●● ●

● r2 = 0.729

●●

●●

●●

●●

● ●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

● ●

● ●●

●●

●● ●●●

●●

●● ●

●●

●●●

●● ●

●●

●●

●●●

●●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

● ●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●● ●

● ●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●● ●●

●●

●●

● ●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●● ●

●●●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

● ●●●

●●

● ●●

●●●

●●● ●

●●

●●

● ●

●●

●●●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●● ●

● r2 = 0.744

●●

●●

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●● ●●

●● ●●

●●●●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●●

●●

●●●

●● ●●

●●

●● ●

● ●●

●●●●

●●

● ●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●

●●●

●●

● ●●

●●●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●●●

●●● ●

●●

●● ●

●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●●●

●●

●●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●● ●

●● ●

●●●●●

●● ●

●●

●●

●●

●● ●

●●● ●●

●●

●●●

●●

●●●

●●

●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●● ●

● r2 = 0.78

● ●

●●

●●

●●

●●

●●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●●

●●

●●●●

●●

● ●

●●

● ●●

●●● ●●●

●●

●●

●● ●

●●

●●●

●● ●

●●

● ●

●●

●●

●●●

●●

● ●●

●●

● ●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●● ●●

●●●

●●

●●

●●

●● ●●

●●

●●

●●●

●●●

●● ●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●● ●

●●●

●●

●● ●

●●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●● ●●●

●●

●●

●●

●● ●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●●●●

●●●

●●

●●● ●●●

●●

● ●●

● ●

●●

●●●

● ●●●

● ●●

●●●

●●● ●

●●

●●

● ●

●●

●●

●●●

●●

●●●●

●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●● ●●●●

●●

●●●

●●

●●●●● ●●

● ●●

●●

●●

●●●

●●

●● ●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●● ●

●●●●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.741

● ●

●●

●●

●●

●●●

●●●

● ●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●●●

● ● ●●

● ● ●●

●●● ●●●●

●●

●●

●● ●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●● ●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

● ●

●●

●●

●● ●

●●

●●

●●

●●●

● ●●

●●

●●

●●●●

● ●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●● ●●

●●

●●●●●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

● ●●

●●

●●

● ●

●●●

●●●●●●

●●●●●

●●

●●●

●●●●

●●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●● ●

●●●

●●●

●●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

● ●●●

●● ●

●●

●●●

●●

●●●

●●

●●●

●●●●●

●●

●●●●

●●●●●●

●●

●●

●●

●● ●

●●

●●

●●● ●

●●●

●●

●● ●

●●

●●

●●●

●● ●●

● ●● ●

●●

●●●

●●

●●●●●

●●

●●●

●●●●●

●●●●

●●

●●

●●●

●●

●●●●●●

●● ●●

●●●

●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.739

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●●●

●●

●●

●●

● ●

● ●●

●●

●● ●●●

●● ●

●●

●●● ●● ●

●●

●●

●●

●●

●●●●

● ●●

●●●

●●

●●●

●●

●●

● ●

●●

●●●

●●●●

●●●

●●●●

●●

●●

●●

● ●●

●●●

●●

●●●●

●● ●●

●●

●●●

●●

●●●

●●

●●

●●

● ●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●● ●

● ●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●●

●●●

●●

●●

●●

●●

●● ●●

●●●●

●●●● ●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●

● ●●

●●●

●●● ●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.748

●●●

●●

●●●●

●●

● ●●●

●●

●●

●●

●●●

●●●●

●● ●●●

●●●●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●● ●●

●● ●●

●●●●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●

●●

●● ●

● ●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●●●

●●● ●

●●

●● ●

●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●● ●

●●●●●

●● ●

●●

●●

●●

●●● ●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●

● r2 = 0.788

●●

●●

● ●

● ●

●●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

● ●

●●●●

●●

●●●●

●●

● ●

● ●●

●●● ●●●

●●

●●

●● ●

●●

●●●

●● ●

●●

● ●

●●●

●●●

●●●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●● ●●

●●

●●

●●●●

●● ●●

●●

●●

●●●

●●●

●● ●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●

●● ●

●●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●● ●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●●

●●●●●

●●

●●

●●

●●

●● ●●●

●●●

●●

●● ●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

● ●●●

●●

●●

●●

● ●●

● ●

●●●●●

●●●

●●

●●● ●●●

●●

● ●

●●

●●●

● ●●●

●●●

●●●●●● ●

●●

●●

● ●

●●

●●

●●●

●●

●●●●

●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●● ●●●●

●●

●●

●●

●●●●●

●●

● ●●

●●

●●

●●●

●●

●●● ●

●●

●●

●●●

●●●●

●●●

●●●

●●

●●●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●● ●

● r2 = 0.75

●●

●●●

●●

● ●

●●

●●

●●●

● ●●●

●●

●●

●●

●●●●●●●●●

●●●●●●

●●

●●

●●

●●

●●●

● ●●●

●●

●●●●

● ● ●●

● ● ●●

●●● ●●

●●

●●

●●

●● ●

●●

●●●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●● ●

●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●● ●

●●

●●

●●

●●●

● ●●

●●

●●

●●●●

● ●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●●●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●

● ●●

●●

●●

● ●

●●●●●●●●●

●●●●●

● ●

●●

●●●

●●●●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●● ●

●●

●●●

●●●

● ●

●●

●●

●●●

●●●

●●

● ●●●

●● ●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●●

●●

●●

●● ●

●●

●●

●●● ●

●●●●●

●● ●

●●

●●

●●

●● ●●

● ●● ●

●●

●●●

●●

●●●●●

●●

●●●

●●●●●

●●●●

●●

● ●

●●●

●●

●●●●●●

●●● ●●●

●●

●●

● ●●

●●

●●●

●●●

●● ●

● r2 = 0.746

●●

●●

●●●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●●

●●

●● ●●●

●●

●● ●

●●

●●● ●● ●

●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●●●●●

●●●

●●

●●

● ●●

●●●

●●●●

●● ●●

●●

●●●

●●

●●●

●●

●●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●●

●●●

●●

●●

●●

●●

●● ●●

●●●

●●

●● ●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●● ●●

● ●

●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●●

● ●●

●●●

●●● ●

●●

● ●

● ●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●● ●●

●●●●●

● ●●

●●

●●

●●●

●● ●

● r2 = 0.753

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●●●

●● ●●●

●●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●● ●●

●● ●●

● ●●●●

●●

●●

●●●

● ●●●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●●

●●

●●●

●● ●●

●●

●● ●

● ●●

●●●

●●

● ●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●●●

●●● ●

●●

●● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●● ●

●●●●●

●● ●

●●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

● r2 = 0.798

● ●

●●

● ●

● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

●●●●

●●

●●●●

●●

● ●

●●

●●●

●●● ●●●

●●

●●

●● ●

●●

●●●

●● ●

●●

● ●

●●●

●●●●

●●

● ●●

●●

●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●● ●●

●●●

●●

●●

●● ●●

●●

●●

●●●

●●●

●● ●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●

●● ●

●●●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●● ●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●● ●●●

●●

●●

●●

●● ●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●● ●●●

●●

●●

● ●

●●

●●

●●●

● ●●●

●●●

●●●

●●● ●

●●

●●

● ●

●●

●●

●●●

●●●●

● ●

●●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●● ●●●●

●●

●●●

●●

●●

●●●●●

●●

● ●●

●●

● ●

●●●

●●

●● ●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●●

●● ●

●●●

●●

●●●

●●

●●

●●●

●● ●

● r2 = 0.752

● ●

●●●

●●

● ●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●●●

● ● ●●

● ● ●●

●●● ●●

●●

●●

●●

●● ●

●●

●●●

●● ●

●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●● ●●

●●

●●

●●●

● ●●

●●

●●

●●

●●●●

● ●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●● ●●

●●

●●●●●●

●●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●

● ●

●●●

●●●●●●

●●●●

●●

●●

●●●

●●●●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●

●●●●

●●

● ●

●●

●● ●

●●●

●●●

●●●

● ●

●●

●●

●●●

●●●

●●

●●

● ●●●

●● ●

●●●●

●●

●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●●

●●

●●

●●

●●

●●

●●● ●

●●●●●

●● ●

●●

●●

●●

●● ●●

● ●● ●

●●

●●●

●●

●●●●●

●●

●●●

●●●●●

●●●●

●●

● ●

●●●

●●

●●●●●●

●● ●●

●●●

●●

●●

●●

●●

●●●

●●●

● r2 = 0.751

051015

051015

051015

051015

AB

CD

−15−10 −5 0 5 −15−10 −5 0 5 −15−10 −5 0 5 −15−10 −5 0 5 −15−10 −5 0 5 −15−10 −5 0 5TaqMan

RNA−seq

FC: 1.5FDR: 0.001

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.05

FC: 2FDR: 0.001

FC: 2FDR: 0.01

FC: 2FDR: 0.05

0

100

200

300

A−B

A−C

A−D

B−C

B−D

C−D A−

BA−

CA−

DB−

CB−

DC−D A−

BA−

CA−

DB−

CB−

DC−D A−

BA−

CA−

DB−

CB−

DC−D A−

BA−

CA−

DB−

CB−

DC−D A−

BA−

CA−

DB−

CB−

DC−D

Comparison

Num

ber o

f DEG

s

up down

a

b

c

ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

Supplementary Figure 12. Correlation of RNA-seq normalized gene expression with TaqMan assays and TaqMan DEGs count. Each column of the grid is a site from all 6 sites. Each row of the grid is a sample from all 4 samples. In each scatter plot, the x-axis is the -1 times the normalized TaqMan Ct values; the y-axis is the RNA-seq log 2 of

Nature Biotechnology: doi:10.1038/nbt.3000

Page 14: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

normalized gene expression. The solid blue lines represent a linear regression fit. The linear fit r2 value of each plot is indicated in the text on top left of each plot. (a) RNA-seq annotation using Aceview gene model. (b) RNAseq annotation using Taqman primer sequence. (c) TaqMan genes DEG count. 3 FDR cutoff and 2 FC cutoff were applied for limma DEG detection pipeline, as above.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 15: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

FC: 1.5

FDR: 0.001FC: 1.5

FDR: 0.01FC: 1.5

FDR: 0.05FC: 2

FDR: 0.001FC: 2

FDR: 0.01FC: 2

FDR: 0.05

●●

● ●● ● ●●● ●●

●●●

● ●●●

●●●● ●

●● ●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

● ●

●●

● ●●

● ●

● ●●

●●●

●●

●●

●●●● ●

●●

●●

●●● ●

●●

●●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●

●● ●

● ●●● ● ●

●●● ●●

● ●●●

●● ●

●●

●● ●

● ●●●● ●

●●●●

●●

●●

●●

●●●

● ●

●●

● ●●

●●

●●●

●● ●

●●

●● ●●●

●●●

●●●

●●

●●

●●●

●●

●●●

● ●

●● ● ●

●●● ●

●●● ●● ●

●●●

●●

● ●●

●●

●●●

●●

●●● ●●

●●

●●●●

●●

●● ●●●

●● ●●●

●●● ●

●●●●

●● ●●

●●●

●●

●● ● ●

●●

●●

●●

●● ●●

●●● ●● ●

●●

● ● ●●

●●

●●●

●●

●●

●●●●●

● ●●

●●● ●

●●

● ●

●●

●●● ●

●●

● ●●●

●●

● ●● ● ●

● ●●●●

●●●

●●● ●

● ●●● ●

● ●

●●

●●●●●

●●●

0.40.50.60.70.80.9

0.40.50.60.70.80.9

0.40.50.60.70.80.9

0.40.50.60.70.80.9

0.40.50.60.70.80.9

0.40.50.60.70.80.9

comp: A−B

comp: A−C

comp: A−D

comp: B−C

comp: B−D

comp: C

−D

Site

MC

C

Analysis Without SVA With SVA

FC: 1.5FDR: 0.05

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.001

FC: 2FDR: 0.05

FC: 2FDR: 0.01

FC: 2FDR: 0.001

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●●●

● ●●● ●●

●●

● ●

●●

● ●

● ●●

●●

●●

● ●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

● ●● ●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

● ●

●● ●●

●●

●●

●●● ● ●

●● ●

●●

● ●● ● ●

● ●●

●●

● ●

● ●

● ●●

●●

●●

●●●

●●●

●●

●●●

● ●●

● ●

●●

●● ● ●

●● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●●●

● ●●

●●

●●●

●●

● ●

● ●●

● ●

● ●● ●

●●

●● ●●

●●●●●

●●

●●

●●

●●●

● ●

●● ●●

●●●●

●●●

● ●

●●

● ●●

●●●

● ●● ●

●●

● ●

●●

●●

0.750.800.850.900.95

0.750.800.850.900.95

0.750.800.850.900.95

0.750.800.850.900.95

0.750.800.850.900.95

0.750.800.850.900.95

comp: A−B

comp: A−C

comp: A−D

comp: B−C

comp: B−D

comp: C

−D

Site

TPR

Analysis Without SVA With SVA

FC: 1.5FDR: 0.001

FC: 1.5FDR: 0.01

FC: 1.5FDR: 0.05

FC: 2FDR: 0.001

FC: 2FDR: 0.01

FC: 2FDR: 0.05

●●

● ●●

●●●●●

● ●● ● ●● ● ●●

●●

● ●

● ●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●●●●●●● ●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●● ●●●●● ● ●●●●●

●● ●

●●

●● ●●

●●●●●

●●●

●● ●

●●

●● ●

●●

●●

●●

● ●

● ●●●●●

● ●● ●●● ●●●● ●●

●●

●●● ●

●●

●●●

●●

●●

●● ●

●●

●●●

●●●● ●● ● ●● ●● ●●

● ●●● ●

●●

●●

● ●● ●

● ●●●

● ●

●●●●●

● ●● ●●

●●● ●●● ●

●●

●●

●●

●●●

●●

●●

●● ●●

●●●●

●● ●● ●● ●● ●●●●

●● ●●

●● ●

●●●●

● ●●●

● ●●● ●●

●●● ● ● ● ●● ●●●

●●

●●●

●●

● ●●

●●

●●

●● ●●

●●

●● ●●● ● ●●●●●

●●

●● ●

● ●●●

●●●

0.00.10.20.3

0.00.10.20.3

0.00.10.20.3

0.00.10.20.3

0.00.10.20.3

0.00.10.20.3

comp: A−B

comp: A−C

comp: A−D

comp: B−C

comp: B−D

comp: C

−D

Site

FPR

Analysis Without SVA With SVA

a

c

b

Supplementary Figure 13. Evaluation of the performance of intra-site DEGs from Illumina data using TaqMan data. (a) MCC: Matthews correlation coefficient. (b) TPR: True positive rate. (c) FPR: False positive rate. Each vertical facet stands for 6 different combinations of FC and FDR cutoff. Each horizontal facet stands for a comparison amongst all 4 samples.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 16: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

Supplementary Figure 14. Illustration of measures about sample identification. All symbols represent the average value of a particular metric among all genes per each site; magenta stars indicate the numbers according to a non-parametric estimate of the probability of measurements to meet the order constraint which is implied by the titration experiment; red stars indicate the numbers according to the mutual information between measurements and samples A and B; green stars indicates the numbers corresponding to the mutual information of the measurements about samples C and D, while cyan stars indicates the numbers according to the mutual information of the measurements about sample titration.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 17: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

●●

●●●●●●

● ●

●●●●●●

●●

●●●●●●●●

●●●●●●

●●●●

●●●●●●●●

●●●● ●●●●

●●●●●●●●

●●●● ●●

●●

●●●●●●

●●

●●●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

● ●● ●●●

●●

●●●●●●● ●

●●●●●●

●●●●●●

●●●●●●

●●

●●

●●●●

●●●●●●●●

●●● ● ●●

●●

●●●● ●●

●●

●●●●

●●

●●

●●●●●●

●●

●●●●

●●

●●●●●●●●

●● ●●●●

●●

●●●●●●● ●

●●● ●●●

●●

●●

●●●●

●●●●●● ●●

●●●● ●●

●●

●●●●●●

● ●

●●●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●●●●●

● ●

●●

●●

● ●●●

●●

●●

●●

●●●●●●

● ●

●●

● ●● ●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●●●●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●

●●

●●●●

●●

●●

● ●●●

A−B, ILM1 A−B, ILM2 A−B, ILM3 A−B, ILM4 A−B, ILM5 A−B, ILM6

A−C, ILM1 A−C, ILM2 A−C, ILM3 A−C, ILM4 A−C, ILM5 A−C, ILM6

A−D, ILM1 A−D, ILM2 A−D, ILM3 A−D, ILM4 A−D, ILM5 A−D, ILM6

B−C, ILM1 B−C, ILM2 B−C, ILM3 B−C, ILM4 B−C, ILM5 B−C, ILM6

B−D, ILM1 B−D, ILM2 B−D, ILM3 B−D, ILM4 B−D, ILM5 B−D, ILM6

C−D, ILM1 C−D, ILM2 C−D, ILM3 C−D, ILM4 C−D, ILM5 C−D, ILM6

0.4

0.6

0.8

1.0

0.4

0.6

0.8

1.0

0.4

0.6

0.8

1.0

0.4

0.6

0.8

1.0

0.4

0.6

0.8

1.0

0.4

0.6

0.8

1.0

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

type

spea

rman

cor

rela

tion

of a

djus

ted

p−va

lues

be

twee

n in

ter−

site

and

intra−s

ite D

EG a

naly

sis

type ●● original EDASeq cqn RUV2 sva PEER

Nature Biotechnology: doi:10.1038/nbt.3000

Page 18: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

Supplementary Figure 15. Spearman correlation of the adjusted p-value between inter-site DEGs and intra-site DEGs.

ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

0.5

0.6

0.7

0.8

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

method

Inte

r−si

te D

EGs

MC

Cva

lidat

ed b

y in

tra−s

ite D

EGs

A−B A−C A−D B−C B−D C−D

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

0.80

0.85

0.90

0.95

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

orig

inal

EDAS

eqcq

nRU

V2sv

aPE

ERor

igin

alED

ASeq

cqn

RUV2

sva

PEER

method

Mat

thew

s co

rrela

tion

coef

ficie

nt:

intra−s

ite D

EGs

valid

ated

by

TaqM

an

a

b

Supplementary Figure 16. Inter-site/intra-site DEG validation. (a) Inter-site DEG validation by TaqMan, evaluated by MCC for all six comparisons (A-B, A-C, A-D, B-C, B-D, C-D). (b)  Evaluation of the intra-site DEGs detections using TaqMan data by MCC.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 19: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

aa b c

d e f

g h i

AGR-A-1 BGI-A-1 CNL-A-1

COH-A-1 MAY-A-1 NVS-A-1

BGI-A-5 CNL-A-5 MAY-A-5

A G

C T

20

40

60

20

40

60

0 25 50 75 100 0 25 50 75 100Position in read

Nuc

leot

ide

frequ

ency

(%)

site ILM1 ILM2 ILM3 ILM4 ILM5 ILM6

sample 1 5j

ILM1.A.1 ILM2.A.1 ILM3.A.1

ILM4.A.1 ILM5.A.1 ILM6.A.1

ILM2.A.5 ILM3.A.5 ILM5.A.5

Supplementary Figure 17. Site-specific base content examination of an independent control library for assessing site-variance. (a-f) We plotted the percentage of each base

Nature Biotechnology: doi:10.1038/nbt.3000

Page 20: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

(y-axis) as a function of the cycle (x-axis) for each of the six sites. We saw that each site (a, b, c, d, e, f) had a slightly different base composition plot. (g-i) We plotted the percentage of each base (y-axis) as a function of the cycle (x-axis) for the three sites that sequenced the vendor-prepared control library (#5). (j) Nucleotide frequency versus position for aligned reads. The percentage of each base was plotted as a function of the read length for each base (A, G, C, T) for two replicates (1, 5) for all sites. Sample 1 was prepared and sequenced independently at each site, whereas sample 5 was prepared at a single site and then sequenced at a subset of all sites. Replicates 1-4 displayed site-dependent base composition frequencies, whereas replicate 5 showed similar base composition frequencies regardless of where it was sequenced, suggesting that base composition frequency is largely a result of library preparation.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 21: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

Percent error vs. read length

ILM1.A.1

ILM1.A.2

ILM1.A.3

ILM1.A.4

ILM2.A.1

ILM2.A.2

ILM2.A.3

ILM2.A.4

ILM2.A.5

ILM3.A.1

ILM3.A.2

ILM3.A.3

ILM3.A.4

ILM3.A.5

ILM4.A.1

ILM4.A.2

ILM4.A.3

ILM4.A.4

ILM5.A.1

ILM5.A.2

ILM5.A.3

ILM5.A.4

ILM5.A.5

ILM6.A.1

ILM6.A.2

ILM6.A.3

ILM6.A.4

ILM1.B.1

ILM1.B.2

ILM1.B.3

ILM1.B.4

ILM2.B.1

ILM2.B.2

ILM2.B.3

ILM2.B.4

ILM2.B.5

ILM3.B.1

ILM3.B.2

ILM3.B.3

ILM3.B.4

ILM3.B.5

ILM4.B.1

ILM4.B.2

ILM4.B.3

ILM4.B.4

ILM5.B.1

ILM5.B.2

ILM5.B.3

ILM5.B.4

ILM5.B.5

ILM6.B.1

ILM6.B.2

ILM6.B.3

ILM6.B.4

ILM1.C

.1ILM

1.C.2

ILM1.C

.3ILM

1.C.4

ILM2.C

.1ILM

2.C.2

ILM2.C

.3ILM

2.C.4

ILM2.C

.5ILM

3.C.1

ILM3.C

.2ILM

3.C.3

ILM3.C

.4ILM

3.C.5

ILM4.C

.1ILM

4.C.2

ILM4.C

.3ILM

4.C.4

ILM5.C

.1ILM

5.C.2

ILM5.C

.3ILM

5.C.4

ILM5.C

.5ILM

6.C.1

ILM6.C

.2ILM

6.C.3

ILM6.C

.4ILM

1.D.1

ILM1.D

.2ILM

1.D.3

ILM1.D

.4ILM

2.D.1

ILM2.D

.2ILM

2.D.3

ILM2.D

.4ILM

2.D.5

ILM3.D

.1ILM

3.D.2

ILM3.D

.3ILM

3.D.4

ILM3.D

.5ILM

4.D.1

ILM4.D

.2ILM

4.D.3

ILM4.D

.4ILM

5.D.1

ILM5.D

.2ILM

5.D.3

ILM5.D

.4ILM

5.D.5

ILM6.D

.1ILM

6.D.2

ILM6.D

.3ILM

6.D.4

1

100

2

4

6

8

10

12

GC distribution

ILM1.A.1

ILM1.A.2

ILM1.A.3

ILM1.A.4

ILM2.A.1

ILM2.A.2

ILM2.A.3

ILM2.A.4

ILM2.A.5

ILM3.A.1

ILM3.A.2

ILM3.A.3

ILM3.A.4

ILM3.A.5

ILM4.A.1

ILM4.A.2

ILM4.A.3

ILM4.A.4

ILM5.A.1

ILM5.A.2

ILM5.A.3

ILM5.A.4

ILM5.A.5

ILM6.A.1

ILM6.A.2

ILM6.A.3

ILM6.A.4

ILM1.B.1

ILM1.B.2

ILM1.B.3

ILM1.B.4

ILM2.B.1

ILM2.B.2

ILM2.B.3

ILM2.B.4

ILM2.B.5

ILM3.B.1

ILM3.B.2

ILM3.B.3

ILM3.B.4

ILM3.B.5

ILM4.B.1

ILM4.B.2

ILM4.B.3

ILM4.B.4

ILM5.B.1

ILM5.B.2

ILM5.B.3

ILM5.B.4

ILM5.B.5

ILM6.B.1

ILM6.B.2

ILM6.B.3

ILM6.B.4

ILM1.C

.1ILM

1.C.2

ILM1.C

.3ILM

1.C.4

ILM2.C

.1ILM

2.C.2

ILM2.C

.3ILM

2.C.4

ILM2.C

.5ILM

3.C.1

ILM3.C

.2ILM

3.C.3

ILM3.C

.4ILM

3.C.5

ILM4.C

.1ILM

4.C.2

ILM4.C

.3ILM

4.C.4

ILM5.C

.1ILM

5.C.2

ILM5.C

.3ILM

5.C.4

ILM5.C

.5ILM

6.C.1

ILM6.C

.2ILM

6.C.3

ILM6.C

.4ILM

1.D.1

ILM1.D

.2ILM

1.D.3

ILM1.D

.4ILM

2.D.1

ILM2.D

.2ILM

2.D.3

ILM2.D

.4ILM

2.D.5

ILM3.D

.1ILM

3.D.2

ILM3.D

.3ILM

3.D.4

ILM3.D

.5ILM

4.D.1

ILM4.D

.2ILM

4.D.3

ILM4.D

.4ILM

5.D.1

ILM5.D

.2ILM

5.D.3

ILM5.D

.4ILM

5.D.5

ILM6.D

.1ILM

6.D.2

ILM6.D

.3ILM

6.D.4

(0,2](2,4](4,6](6,8](8,10](10,12](12,14](14,16](16,18](18,20](20,22](22,24](24,26](26,28](28,30](30,32](32,34](34,36](36,38](38,40](40,42](42,44](44,46](46,48](48,50](50,52](52,54](54,56](56,58](58,60](60,62](62,64](64,66](66,68](68,70](70,72](72,74](74,76](76,78](78,80](80,82](82,84](84,86](86,88](88,90](90,92](92,94](94,96](96,98](98,100]

0

2

4

6

8

Coverage across genebody (%)

ILM1.A.1

ILM1.A.2

ILM1.A.3

ILM1.A.4

ILM2.A.1

ILM2.A.2

ILM2.A.3

ILM2.A.4

ILM2.A.5

ILM3.A.1

ILM3.A.2

ILM3.A.3

ILM3.A.4

ILM3.A.5

ILM4.A.1

ILM4.A.2

ILM4.A.3

ILM4.A.4

ILM5.A.1

ILM5.A.2

ILM5.A.3

ILM5.A.4

ILM5.A.5

ILM6.A.1

ILM6.A.2

ILM6.A.3

ILM6.A.4

ILM1.B.1

ILM1.B.2

ILM1.B.3

ILM1.B.4

ILM2.B.1

ILM2.B.2

ILM2.B.3

ILM2.B.4

ILM2.B.5

ILM3.B.1

ILM3.B.2

ILM3.B.3

ILM3.B.4

ILM3.B.5

ILM4.B.1

ILM4.B.2

ILM4.B.3

ILM4.B.4

ILM5.B.1

ILM5.B.2

ILM5.B.3

ILM5.B.4

ILM5.B.5

ILM6.B.1

ILM6.B.2

ILM6.B.3

ILM6.B.4

ILM1.C

.1ILM

1.C.2

ILM1.C

.3ILM

1.C.4

ILM2.C

.1ILM

2.C.2

ILM2.C

.3ILM

2.C.4

ILM2.C

.5ILM

3.C.1

ILM3.C

.2ILM

3.C.3

ILM3.C

.4ILM

3.C.5

ILM4.C

.1ILM

4.C.2

ILM4.C

.3ILM

4.C.4

ILM5.C

.1ILM

5.C.2

ILM5.C

.3ILM

5.C.4

ILM5.C

.5ILM

6.C.1

ILM6.C

.2ILM

6.C.3

ILM6.C

.4ILM

1.D.1

ILM1.D

.2ILM

1.D.3

ILM1.D

.4ILM

2.D.1

ILM2.D

.2ILM

2.D.3

ILM2.D

.4ILM

2.D.5

ILM3.D

.1ILM

3.D.2

ILM3.D

.3ILM

3.D.4

ILM3.D

.5ILM

4.D.1

ILM4.D

.2ILM

4.D.3

ILM4.D

.4ILM

5.D.1

ILM5.D

.2ILM

5.D.3

ILM5.D

.4ILM

5.D.5

ILM6.D

.1ILM

6.D.2

ILM6.D

.3ILM

6.D.4

5'

3'

0.2

0.4

0.6

0.8

1

1.2

a

b

c

GC distribution

Percent error vs. read.length

Coverage across genebody (%)

Supplementary Figure 18. Intra- and inter-site variations of three additional quality metrics for the Illumina dataset. (a) Percentage of GC content over all reads. GC content (%) was binned in 2% increments and plotted on the y-axis. (b) Percent error

Nature Biotechnology: doi:10.1038/nbt.3000

Page 22: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

rate (y-axis) per base for all reads. Error rate was calculated relative to the reference sequence from STAR aligner results. (c) Coverage uniformity across gene bodies. All transcript lengths were scaled and reads attributed to 100 bins along the sequence. The coverage (percentage) along each gene body from 5’→ 3’ was plotted by color along the y-axis. Plots were made with pheatmap in R.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 23: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

40

50

60

ILM1 ILM2 ILM3 ILM4 ILM5 ILM6site

Dup

licat

ion

rate

per

libr

ary

(%)

samplesABCD

Supplementary Figure 19. Duplication rate per library. On average, there are 45% of duplication rate per library. This quality metrics significantly associated with the latent variables from the PEER and SVA.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 24: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

GC distribution

PGM1.A.1

PGM1.A.2

PGM1.A.3

PGM1.A.4

PGM2.A.1

PGM2.A.2

PGM3.A.1

PGM3.A.2

PGM1.B.1

PGM1.B.2

PGM1.B.3

PGM1.B.4

PGM2.B.1

PGM2.B.2

PGM3.B.1

PGM3.B.2

(0,2](2,4](4,6](6,8](8,10](10,12](12,14](14,16](16,18](18,20](20,22](22,24](24,26](26,28](28,30](30,32](32,34](34,36](36,38](38,40](40,42](42,44](44,46](46,48](48,50](50,52](52,54](54,56](56,58](58,60](60,62](62,64](64,66](66,68](68,70](70,72](72,74](74,76](76,78](78,80](80,82](82,84](84,86](86,88](88,90](90,92](92,94](94,96](96,98](98,100]

0

2

4

6

8

10

12A B

−0.6

−0.3

0.0

0.3

0.6

−0.6

−0.3

0.0

0.3

0.6

−0.6

−0.3

0.0

0.3

0.6

V1V2

V3

PGM

1

PGM

2

PGM

3

PGM

1

PGM

2

PGM

3

site

Late

nt v

aria

bles

replicates 1 2 3 4

a b

Supplementary Figure 20. GC content quality metric and latent variables from PGM data. Percentage of GC content over all reads in the Proton dataset. GC (%) was binned into 2% increments and plotted on the y-axis. GC content varies among sites for the same library (A, B).

Nature Biotechnology: doi:10.1038/nbt.3000

Page 25: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

−2 −1 0 1

−0.5

0.0

0.5

1.0

1.5

Dimension 1

Dim

ensi

on 2

A.1A.2 A.3

A.4

B.1B.2B.3B.4

A.1

A.2

B.1B.2

A.1A.2

B.1B.2

PGM1PGM2PGM3

A B

4.5

5.0

5.5

6.0PG

M1

PGM

2

PGM

3

PGM

1

PGM

2

PGM

3site

Max

imum

GC

read

%)

● ● ●PGM1 PGM2 PGM3

1 2 3 4

A B

56789

1011

PGM

1

PGM

2

PGM

3

PGM

1

PGM

2

PGM

3

siteAver

age

base

erro

r rat

e (%

)

● ● ●PGM1 PGM2 PGM3

1 2 3 4

A B

26

28

30

PGM

1

PGM

2

PGM

3

PGM

1

PGM

2

PGM

3

site

Coe

ffici

ent o

f var

iatio

n(g

eneb

ody

%)

● ● ●PGM1 PGM2 PGM3

1 2 3 4

comp: A comp: B

0

50

100

150

200PG

M1−

PGM

2

PGM

1−PG

M3

PGM

2−PG

M3

PGM

1−PG

M2

PGM

1−PG

M3

PGM

2−PG

M3

Comparison

Fals

e po

sitiv

e D

EGs

Without SVA With SVA

0

2500

5000

7500

10000

12500

PGM

1

PGM

2

PGM

3

ComparisonN

umbe

r of D

EGs

Without SVA With SVA

0.0

0.1

0.2

0.3

0.4

PGM

1

PGM

2

PGM

3

Site

MC

C

Without SVA With SVA

d e f g

a b c

Supplementary Figure 21. Examination of quality metrics and DEG detection for PGM data. (a) Maximum percentage of reads with the corresponding GC content (0% to 100%) for each sequencing replicate 1..4. Replicates are indicated by circle, triangle, square, and plus-sign. Sites are shown as different colors. (b) Average error rate across all sequencing bases. (c) Coefficient of variation of the percentage of genebody coverage, which is a measure of the evenness of coverage across genebody. (d) MDS plot of three PGM sites for sample A and B. Sample A and B is clearly distinguished by dimension 1. For sample A, replicate 4 from PGM1 site and replicate 1 from PGM2 site were distinguish at dimension 2 from the other A samples.

Nature Biotechnology: doi:10.1038/nbt.3000

Page 26: Detecting and Ameliorating Systematic Variation from Large ... · Detecting and Ameliorating Systematic Variation from Large-scale RNA Sequencing Sheng Li1,2,*, Paweł P. Łabaj3,*,

a b c PG

M1.

A −

PGM

2.A

PGM

1.A −

PGM

3.A

PGM

1.A −

PGM

1.B

PGM

1.A −

PGM

2.B

PGM

1.A −

PGM

3.B

PGM

2.A −

PGM

3.A

PGM

2.A −

PGM

1.B

PGM

2.A −

PGM

2.B

PGM

2.A −

PGM

3.B

PGM

3.A −

PGM

1.B

PGM

3.A −

PGM

2.B

PGM

3.A −

PGM

3.B

PGM

1.B −

PGM

2.B

PGM

1.B −

PGM

3.B

PGM

2.B −

PGM

3.B

original

Num

ber o

f DEG

s

0

2000

4000

6000

8000

PGM

1.A −

PGM

2.A

PGM

1.A −

PGM

3.A

PGM

1.A −

PGM

1.B

PGM

1.A −

PGM

2.B

PGM

1.A −

PGM

3.B

PGM

2.A −

PGM

3.A

PGM

2.A −

PGM

1.B

PGM

2.A −

PGM

2.B

PGM

2.A −

PGM

3.B

PGM

3.A −

PGM

1.B

PGM

3.A −

PGM

2.B

PGM

3.A −

PGM

3.B

PGM

1.B −

PGM

2.B

PGM

1.B −

PGM

3.B

PGM

2.B −

PGM

3.B

EDASeq

Num

ber o

f DEG

s

0

1000

2000

3000

4000

5000

6000

PGM

1.A −

PGM

2.A

PGM

1.A −

PGM

3.A

PGM

1.A −

PGM

1.B

PGM

1.A −

PGM

2.B

PGM

1.A −

PGM

3.B

PGM

2.A −

PGM

3.A

PGM

2.A −

PGM

1.B

PGM

2.A −

PGM

2.B

PGM

2.A −

PGM

3.B

PGM

3.A −

PGM

1.B

PGM

3.A −

PGM

2.B

PGM

3.A −

PGM

3.B

PGM

1.B −

PGM

2.B

PGM

1.B −

PGM

3.B

PGM

2.B −

PGM

3.B

PEER

Num

ber o

f DEG

s

0

1000

2000

3000

4000

d e f

0.5

0.6

0.7

0.8

orig

inal

EDAS

eq

PEER

type

Spea

rman

cor

rela

tion

coef

ficie

nts

of in

tra−s

ite D

EGs

adju

sted

p−v

alue

0.60

0.65

0.70

0.75

0.80

0.85PG

M1

PGM

2

PGM

3

site

Mat

thew

s co

rrela

tion

coef

ficie

nt:

inte

r−si

te D

EGs

valid

ated

by

intra−s

ite D

EGs

original EDASeq PEER

0.0

0.2

0.4

0.6

PGM

1

PGM

2

PGM

3

site

Mat

thew

s co

rrela

tion

coef

ficie

nt:

intra−s

ite D

EGs

valid

ated

by

TaqM

an

original EDASeq PEER

Supplementary Figure 22. PGM Inter-site and intra-site DEGs analysis. (a-c) PGM Inter-site and intra-site DEGs using original method (a), EDASeq (b), and PEER (c). (d-e) Evaluation of the inter-site DEGs detections using intra-site DEGs. (d) The spearman correlation of adjust p-value between inter-site DEGs with intra-site DEGs. (e) Inter-site DEGs validated by intra-site DEGs, measured by Matthews Correlation Coefficient (MCC). (f) Evaluation of the intra-site DEGs detections using TaqMan data by MCC.

Nature Biotechnology: doi:10.1038/nbt.3000