charlotte soneson university of zurich brixen 2016 · •ewels et al.: multiqc: summarize analysis...
TRANSCRIPT
![Page 1: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/1.jpg)
Experimental designCharlotte Soneson University of Zurich
Brixen 2016
![Page 2: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/2.jpg)
“To call in the statistician after the experiment is done
may be no more than asking him to perform a postmortem examination:
he may be able to say what the experiment died of.”
Sir Ronald Fisher, Indian Statistical Congress, Sankhya, around 1938
![Page 3: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/3.jpg)
• The organization of an experiment, to ensure that the right type of data, and enough of it, is available to answer the questions of interest as clearly and efficiently as possible.
What is experimental design?
http://www.stats.gla.ac.uk/steps/glossary/anova.html#expdes
![Page 4: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/4.jpg)
What is bad experimental design?Bad experimental design - examples
Treatment 1
M M M M M M M M
Treatment 1I
F F F F F F F F
![Page 5: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/5.jpg)
What is bad experimental design?Bad experimental design - examples
Treatment 1
M M M M M M M M
Treatment 1I
F F F F F F F FConfounding!
![Page 6: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/6.jpg)
What is bad experimental design?Bad experimental design - examples
Analysis batch I / Study center I / Processing protocol I ...
Tr Tr Tr Tr Tr Tr Tr Tr
Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl
Analysis batch II / Study center II / Processing protocol II ...
![Page 7: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/7.jpg)
What is bad experimental design?Bad experimental design - examples
Analysis batch I / Study center I / Processing protocol I ...
Tr Tr Tr Tr Tr Tr Tr Tr
Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl
Analysis batch II / Study center II / Processing protocol II ...
Confounding!
![Page 8: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/8.jpg)
• Example: gene expression study comparing 60 CEU and 82 ASN HapMap individuals
• 26% of the genes were found to be significantly differentially expressed (78% with less restrictive multiple testing correction)
• But: all CEU samples were processed (sometimes years) before all the ASN samples!
What can happen with bad experimental design?
Akey et al., Nature Genetics 2007; Spielman et al., Nature Genetics 2007
![Page 9: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/9.jpg)
• Example: gene expression study comparing 60 CEU and 82 ASN HapMap individuals
• 26% of the genes were found to be significantly differentially expressed (78% with less restrictive multiple testing correction)
• But: all CEU samples were processed (sometimes years) before all the ASN samples!
What can happen with bad experimental design?
Confounding!
Akey et al., Nature Genetics 2007; Spielman et al., Nature Genetics 2007
![Page 10: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/10.jpg)
What can happen with bad experimental design?
Comparing CEU and ASN Comparing processing times
78% differentially expressed 96% differentially expressed
Akey et al., Nature Genetics 2007
![Page 11: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/11.jpg)
p-values from test comparing CEU and ASN, after controlling for the processing year
“Batch effect correction” won’t work here
0% differentially expressed
Akey et al., Nature Genetics 2007
![Page 12: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/12.jpg)
• Process all samples at the same time (not always feasible)
• Minimize confounding as much as possible through
• blocking • randomization
• The batch effect will still be there, but with an appropriate design we can account for it!
What could be a good experimental design?
![Page 13: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/13.jpg)
Non-confounded design
Batch A Batch B
TreatedUntreated
Gen
e ex
pres
sion
![Page 14: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/14.jpg)
Non-confounded design
Batch A Batch B
TreatedUntreated
Gen
e ex
pres
sion
![Page 15: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/15.jpg)
Non-confounded design
Treated
Gen
e ex
pres
sion
Untreated
![Page 16: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/16.jpg)
• In statistical modeling, batch effects can be included as covariates (additional predictors) in the model.
• For exploratory analysis, we often attempt to “eliminate” or “adjust for” such unwanted variation in advance, by subtracting the estimated effect from each variable.
• Even partial confounding between batch and signal of interest can lead to bias.
Accounting for batch effects
Nygaard et al., Biostatistics 2016
![Page 17: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/17.jpg)
• Public, processed RNA-seq data from 3 tissues, 4 studies show strong association with study
Accounting for batch effects in practice
Danielsson et al., Briefings in Bioinformatics 2015
color = tissue; symbol = study (batch)
![Page 18: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/18.jpg)
• Accounting for the batch effect brings out signal of interest.
Accounting for batch effects in practice
color = tissue; symbol = study (batch)
Danielsson et al., Briefings in Bioinformatics 2015
![Page 19: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/19.jpg)
• 5-subtype breast cancer microarray data processed in six batches.
Accounting for batch effects in practice
Larsen et al., BioMed Research International 2014
![Page 20: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/20.jpg)
• 5-subtype breast cancer microarray data processed in six batches.
Accounting for batch effects in practice
Larsen et al., BioMed Research International 2014
![Page 21: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/21.jpg)
• Manifests as systematic “unwanted variation” in data
• Identify using e.g.
• control genes (“housekeeping” genes, spike-ins)
• residuals after eliminating known signal
• Include estimated unwanted variation as covariate(s) in the statistical model
• RUV, sva packages commonly used in genomics
What if the batch variable is unknown?
Leek & Storey (2007); Gagnon-Bartsch & Speed (2012); Leek (2014); Risso et al (2014)
![Page 22: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/22.jpg)
• No batch effect, no differentially expressed genes
Impact of batch effect on p-value histogram
p−value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
050
100
150
200
250
300
B BTG
![Page 23: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/23.jpg)
• No batch effect, some differentially expressed genes
Impact of batch effect on p-value histogram
p−value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0200
400
600
800
1000
B BTG
![Page 24: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/24.jpg)
• Batch effect (no confounding), some differentially expressed genes
Impact of batch effect on p-value histogram
T
T
G
U
p−value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0100
200
300
400
![Page 25: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/25.jpg)
• Batch effect (no confounding), some differentially expressed genes - after correction
Impact of batch effect on p-value histogram
T
T
G
U
p−value
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0200
400
600
800
1000
![Page 26: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/26.jpg)
• Batch effect adjustment goes beyond the “global” between-sample normalization methods.
Batch effect adjustment vs normalization
Leek et al., Nature Reviews Genetics 2010
![Page 27: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/27.jpg)
• Batch effect adjustment goes beyond the “global” between-sample normalization methods.
Leek et al., Nature Reviews Genetics 2010
Batch effect adjustment vs normalization
![Page 28: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/28.jpg)
• Replicates are necessary to estimate within-condition variability.
• Variability estimates are, in turn, vital for statistical testing.
Other design issues: replication
Gen
e ex
pres
sion
Group 1 Group 2
![Page 29: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/29.jpg)
• Replicates are necessary to estimate within-condition variability.
• Variability estimates are, in turn, vital for statistical testing.
Other design issues: replication
Gen
e ex
pres
sion
Group 1 Group 2
![Page 30: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/30.jpg)
• Replicates are necessary to estimate within-condition variability.
• Variability estimates are, in turn, vital for statistical testing.
Other design issues: replication
Gen
e ex
pres
sion
Group 1 Group 2
![Page 31: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/31.jpg)
• As always, it depends…
• on what we want to do (differential gene expression, variant detection, GWAS, …)
• on the variability between samples (cell lines, inbred animals, patients, …)
• on the magnitude of the expected effect
Other design issues: sample size
![Page 32: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/32.jpg)
False discovery rate, 2 replicates/conditionFDR, 2 replicates/condition
![Page 33: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/33.jpg)
False discovery rate, 5 replicates/conditionFDR, 5 replicates/condition
![Page 34: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/34.jpg)
False discovery rate, 10 replicates/conditionFDR, 10 replicates/condition
![Page 35: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/35.jpg)
Similarity between sets of differentially expressed genes, 2 replicates/conditionSimilarity between sets of DEGs, 2 replicates/condition
![Page 36: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/36.jpg)
Similarity between sets of differentially expressed genes, 5 replicates/conditionSimilarity between sets of DEGs, 5 replicates/condition
![Page 37: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/37.jpg)
Similarity between sets of differentially expressed genes, 10 replicates/conditionSimilarity between sets of DEGs, 10 replicates/condition
![Page 38: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/38.jpg)
Schurch et al., RNA 2016
![Page 39: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/39.jpg)
And now for something completely different…
http://www.publicdomainpictures.net/view-image.php?image=56137&picture=-
![Page 40: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/40.jpg)
• Contamination
• Sequencing failures
• Remaining adapters
• PCR duplicates
• …
No matter how carefully you design your experiment,data can still be compromised…
![Page 41: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/41.jpg)
QC software for NGS data - example
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQC - raw read QC
![Page 42: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/42.jpg)
QC software for NGS data - example
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQC - raw read QC
![Page 43: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/43.jpg)
QC software for NGS data - example
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQC - raw read QC
![Page 44: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/44.jpg)
QC software for NGS data - example
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQC - raw read QC
![Page 45: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/45.jpg)
QC software for NGS data - example
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQC - raw read QC
![Page 46: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/46.jpg)
QC software for NGS data - example
http://multiqc.info/docs/#
multiQC - summarize results from many analyses
![Page 47: Charlotte Soneson University of Zurich Brixen 2016 · •Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016)](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75e1f9f2574d6d561a4350/html5/thumbnails/47.jpg)
• Ewels et al.: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016) - multiQC
• Akay et al.: On the design and analysis of gene expression studies in human populations. Nature Genetics 39(7):807-808 (2007)
• Nygaard et al.: Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics 17(1):29-39 (2016)
• Danielsson et al.: Assessing the consistency of public human tissue RNA-seq data sets. Briefings in Bioinformatics 16(6):941-949 (2015)
• Larsen et al.: Microarray-based RNA profiling of breast cancer: batch effect removal improves cross-platform consistency. BioMed Research International vol. 2014, article ID 651751 (2014)
• Leek et al.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics 11(10:733-739 (2010)
• Leek & Storey: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics 3(9):e161 (2007)
• Leek: svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Research 42(42):e161 (2014)
• Risso et al.: Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology 32(9):896-902 (2014)
• Gagnon-Bartsch & Speed: Using control genes to correct for unwanted variation in microarray data. Biostatistics 13(3):539-552 (2012)
• Schurch et al.: How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA (2016)
References