rna-seq:filtering,quality control and...
TRANSCRIPT
![Page 1: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/1.jpg)
RNA-seq: filtering, quality control and visualisation
COMBINE RNA-seq Workshop
![Page 2: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/2.jpg)
QC and visualisation (part 1)
![Page 3: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/3.jpg)
RNA-seq of Mouse mammary gland
Basal cells
Luminal cells
Virgin
Pregnant
Lactating
Virgin
Pregnant
Lactating
n=2
n=2
n=2
n=2
n=2
n=2
Fu et al. (2015) ‘EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cell survival’ Nat Cell Biol
Slide taken from COMBINE RNAseq workshop on 23/09/2016
![Page 4: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/4.jpg)
(some) questions we can ask• Which genes are differentially expressed
between basal and luminal cells?• … between basal and luminal in virgin mice?• … between pregnant and lactating mice?• … between pregnant and lactating mice in
basal cells?
Slide taken from COMBINE RNAseq workshop on 23/09/2016
![Page 5: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/5.jpg)
• Reading in the data – counts data and sample information
• Formatting the data – clean it up so we can look at it easily
![Page 6: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/6.jpg)
Filtering out lowly expressed genes
• Genes with very low counts in all samples provide little evidence for differential expression
• Often samples have many genes with zero or very low counts
−10 −5 0 5 10 15
0.00
0.05
0.10
0.15
0.20
Density
A. Raw data
Log−cpm
10_6_5_119_6_5_11purep53JMS8−2JMS8−3JMS8−4JMS8−5JMS9−P7cJMS9−P8c
−5 0 5 10 15
0.00
0.05
0.10
0.15
0.20
Density
B. Filtered data
Log−cpm
10_6_5_119_6_5_11purep53JMS8−2JMS8−3JMS8−4JMS8−5JMS9−P7cJMS9−P8c
![Page 7: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/7.jpg)
Filtering out lowly expressed genes
• Testing for differential expression for many genes simultaneously adds to the multiple testing burden, reducing the power to detect DE genes.
• IT IS VERY IMPORTANT to filter out genes that have all zero counts or very low counts.
• We filter using CPM values rather than counts because they account for differences in sequencing depth between samples.
![Page 8: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/8.jpg)
Filtering out lowly expressed genes
• CPM = counts per million, or how many counts would I get for a gene if the sample had a library size of 1M.
For a given gene:Library size Count CPM
1M 1 1
10M 10 1
20M 10 0.5
![Page 9: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/9.jpg)
Filtering out lowly expressed genes • Use a CPM threshold to define “expressed” and
“unexpressed”• As a general rule, a good threshold can be chosen for a
CPM value that corresponds to a count of 10.• In our dataset, the samples have library sizes of 20 to 20
something million.
Library size Count CPM
1M 1 1
10M 10 1
20M 10 0.5
![Page 10: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/10.jpg)
Filtering out lowly expressed genes
• Use a CPM threshold to define “expressed” and “unexpressed”
• As a general rule, a good threshold can be chosen for a CPM value that corresponds to a count of 10.
• In our dataset, the samples have library sizes of 20 to 20 something million.
Library size Count CPM
1M 1 1
10M 10 1
20M 10 0.5
We use a CPM threshold of 0.5!
![Page 11: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/11.jpg)
Filtering out lowly expressed genes
• Use a CPM threshold to define “expressed” and “unexpressed”
• As a general rule, a good threshold can be chosen for a CPM value that corresponds to a count of 10.
• In our dataset, the samples have library sizes of 20 to 20 something million.
Library size Count CPM
1M 1 1
10M 10 1
20M 10 0.5
We use a CPM threshold of 0.5!
But if this is too hard to work out, a CPM
threshold of 1 works well in most cases.
![Page 12: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/12.jpg)
Filtering out lowly expressed genes
Basal cells
Luminal cells
Virgin
Pregnant
Lactating
Virgin
Pregnant
Lactating
• We keep any gene that is (roughly) expressed in at least one group.
• 12 samples, 6 groups, 2 replicates in each group.
Keep if CPM > 0.5 in at least 2 out of 12 samples
![Page 13: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/13.jpg)
Filtering out lowly expressed genes
Basal cells
Luminal cells
Virgin expressed
Pregnant expressed
Lactating expressed
Virgin unexpressed
Pregnant unexpressed
Lactating unexpressed
• We keep any gene that is (roughly) expressed in at least one group.
• 12 samples, 6 groups, 2 replicates in each group.
Keep if CPM > 0.5 in at least 2 out of 12 samples
![Page 14: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/14.jpg)
Filtering out lowly expressed genes
Basal cells
Luminal cells
Virgin unexpressed
Pregnant expressed
Lactating unexpressed
Virgin unexpressed
Pregnant unexpressed
Lactating unexpressed
• We keep any gene that is (roughly) expressed in at least one group.
• 12 samples, 6 groups, 2 replicates in each group.
Keep if CPM > 0.5 in at least 2 out of 12 samples
![Page 15: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/15.jpg)
Filtering out lowly expressed genes • We keep any gene that is (roughly) expressed in at least one
group.• 12 samples, 6 groups, 2 replicates in each group.
Keep gene if CPM > 0.5 in at least 2 or more samples
Basal cells
Luminal cells
Virgin unexpressed
Pregnant expressed
Lactating unexpressed
Virgin unexpressed
Pregnant unexpressed
Lactating unexpressed
![Page 16: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/16.jpg)
QC and visualisation (part 2)
![Page 17: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/17.jpg)
MDS Plots
• A visualisation of a principle components analysis
which looks at where the greatest sources of
variation in the data come from.
• Distances represents the typical log2-FC observed
between each pair of samples
– e.g. 6 units apart = 2^6 = 64-fold difference
• Unsupervised – separation based on data, no
prior knowledge of experimental design.
– Useful for an overview of the data. Do samples
separate by experimental groups?
– Quality control
– Outliers?
![Page 18: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/18.jpg)
![Page 19: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/19.jpg)
![Page 20: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/20.jpg)
QC and visualisation (part 3)
![Page 21: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/21.jpg)
Normalisation for composition bias
●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●
●
●
●
●
●●
●●●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●●●●
●
●●
●●●●
●
●
●●●
●●●●
●●
●
●●
●
●
●
●●●
●
●
●●●●●●
●
●
●●●●●●
●
●●●●●●●●●●
●
●●
●
●
●●
●●●●●●
●
●
●●●
●
●●●●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●●●
●●
●●
●
●
●●●●●
●
●
●●●●
●●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●●●●●●●
●
●
●
●●
●●
●●
●●●●
●●
●
●●
●●●●●●●●●
●●●●
●●
●
●●●●
●●●●●●
●
●
●●●●
●●
●●
●
●●
●●●●●
●
●●●●●
●
●●●●
●●●●
●●
●●
●●●●
●●
●●
●●●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●
●
●
●●●●
●●
●●
●●
●●
●
●●●●
●
●●●●●●●●
●
●
●
●●
●
●●
●●●
●●●
●
●
●●●●
●●●●
●
●●●●
●●
●
●
●●
●
●
●●
●
●
●●●
●●●
●●
●●
●
●●
●
●●
●
●●●●●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●●●●●
●
●●
●
●●●
●
●
●
●●
●
●●
●●●
●
●
●
●●●●●
●●●●●●
●
●●●
●●
●
●●
●
●●
●
●
●●
●
●●●●●●
●●●●●
●
●
●
●
●
●●
●
●
●●●●●●
●
●●●
●●●●
●
●●●●●●●●●●●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●●
●
●
●●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●●●
●●●●
●
●●
●●
●
●
●●●●
●
●●
●
●●●●
●●●●
●
●●●
●
● ●
●
●
●
●●
●
●●●●
●
●
●●●
●
●●●●
●●
●
●
●●●
●●●●
●●●
●
●●
●●
●●
●●●
●●●●●●●●●●
●
●●●
●●
●
●
●
●
●
●●●●
●●
●
●●●●●●
●●●●●●●●●●●●
●
●
●
●
●
●●●●●
●
●●●●
●●
●
●●
●
●
●●●
●●●●●●●●●
●●
●●
●●●
●
●●●
●
●●
●
●●●●●●
●
●
●●●●●
●
●
●
●●
●●
●●●
●●●●●
●●
●
●
●
●
●●
●●●●
●
●●●●●●●●●●●●
●
●●●●●●
●
●
●
●
●●●
●●●
●
●●
●
●●●●
●
●●
●
●●●●●
●
●
●●
●●
●●●●
●
●
●●
●●●●
●
●
●
●
●
●●●●●●●●●●
●●
●
●
●●
●●●
●
●●●●●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●●●●●●●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●●
●
●●●
●●●
●
●
●
●●●
●
●
●
●●●
●
●●●
●●●●●●●
●
●●●
●●
●●●●●
●
●●
●
●●●●●●
●
●
●●●
●
●
●●●●●●
●
●
●
●●●●●
●
●●●●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●●
●●
●
●●●●
●
● ●
●●
●●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●●●●●
●
●
●●
●
●●●●
●●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●
●
●●●●●
●
●
●●
●
●●
●●●●●●●●●
●●●●●
●
●●
●●
●
●●●
●●●●
●●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●
●●
●●●●●●●●●●●●●
●●
●●●
●
●●●●●●●●
●
●●●
●●
●
●●
●
●●
● ●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●●●●
●●
●●
●
●●●
●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●●●●
●
●
●
●●
●●●●
●
●
●●
●
●
●●●
●
●●●
10_6_5_11
9_6_5_11
purep53
JMS8−2
JMS8−3
JMS8−4
JMS8−5
JMS9−P7c
JMS9−P8c
−5
0
5
10
15
A. Example: Unnormalised data
Log−cpm
●
●
●
●●●●
●
●
●
●●●●●●●●●●
●
●
●
●●●●●●●●●●●●
●
●●●●
●
●●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●●●●●●●
●
●
●
●
●
●●●●●
●●
●
●
●●●●●●●
●●
●
●●●
●
●
●
●
●
●
●●
●●●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●●●●
●
●●
●●●●
●
●
●●●
●●●●
●●
●
●●
●
●
●
●●●
●
●
●●●●●●
●
●
●●●●●●
●
●●●●●●●●●●
●
●●
●
●
●●
●●●●●●
●
●
●●●
●
●●●●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●●●
●●
●●
●
●
●●●●●
●
●
●●●●
●●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●●●●●●●
●
●
●
●●
●●
●●
●●●●
●●
●
●●
●●●●●●●●●
●●●●
●●
●
●●●
●
●●●●●●
●
●
●
●●●
●●
●●
●
●●
●●●●●
●
●●●●●
●
●●
●●
●●●●
●●
●●
●●●●
●●
●●
●●●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●
●
●
●●●●
●●
●●
●●
●●
●
●●●●
●
●●●●●●●●
●
●
●
●●
●
●●
●●●
●●●
●
●
●●●●
●●●●
●
●●●●
●●
●
●
●●
●
●
●●
●
●
●●●
●●●
●●
●●
●
●●
●
●●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●●●●●
●
●●
●
●●●
●
●
●
●●
●
●●
●●●
●
●
●
●●●●●
●●●●●●
●
●●●
●●
●
●●
●
●●
●
●
●●
●
●●●●●●
●●●●●
●
●
●
●
●
●●
●
●
●●●●●●
●
●●●
●●●●
●
●●●●●●●●●●●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●●
●
●
●●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●●●
●●●●
●
●●
●●
●
●
●●●●
●
●●
●
●●●●
●●●●
●
●●●
●
● ●
●
●
●
●●
●
●●●●
●
●
●●●
●
●●
●●
●●
●
●
●
●●
●●●●
●●●
●
●●
●●
●●
●●●
●●●●●●●●●●
●
●●●
●●
●
●
●
●
●
●●●●
●●
●
●●●●
●
●
●●●●●●●●●●●●
●
●
●
●
●
●●●●●
●
●●●●
●●
●
●●
●
●
●●●
●●●●●●
●●●
●●
●●
●●●
●
●●●
●
●●
●
●●●●●
●
●
●
●●●●●
●
●
●
●●
●●
●●●
●●●●●
●●
●
●
●
●
●●
●●●●
●
●●●●●●●●●●●●
●
●●●●●●
●
●
●
●
●●●
●●●
●
●●
●
●●●●
●
●●
●
●●●●●
●
●
●●
●●
●●●●
●
●
●●
●●●●
●
●
●
●
●
●●●●●●●●●●
●●
●
●
●●
●●●
●
●●●●●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●●●●●●●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●●
●
●●●
●●●
●
●
●
●●●
●
●
●
●●●
●
●●●
●●●●●●●
●
●●●
●●
●●●●●
●
●●
●
●●●●●●
●
●
●●●
●
●
●●●●●●
●
●
●
●●●●●
●
●●●●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●●
●●
●
●●●●
●
● ●
●●
●●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●●●●●
●
●
●●
●
●●●●
●●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●
●
●●●●●
●
●
●●
●
●●
●●●●●●●●●
●●●●●
●
●●
●●
●
●●●
●●●●
●●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●
●●
●●●●●●●●●●●●●
●●
●●●
●
●●●●●●●●
●
●●●
●●
●
●●
●
●●
● ●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●●●●
●●
●●
●
●●●
●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●●●●
●
●
●
●●
●●●●
●
●
●●
●
●
●●●
●
●●●
10_6_5_11
9_6_5_11
purep53
JMS8−2
JMS8−3
JMS8−4
JMS8−5
JMS9−P7c
JMS9−P8c
−5
0
5
10
15
B. Example: Normalised data
Log−cpm
If we ran a DE analysis on Sample 1 and Sample 3, almost
all genes will be down-regulated in Sample 1!!
![Page 22: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/22.jpg)
Normalisation for composition bias
●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●
●
●
●
●
●●
●●●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●●●●
●
●●
●●●●
●
●
●●●
●●●●
●●
●
●●
●
●
●
●●●
●
●
●●●●●●
●
●
●●●●●●
●
●●●●●●●●●●
●
●●
●
●
●●
●●●●●●
●
●
●●●
●
●●●●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●●●
●●
●●
●
●
●●●●●
●
●
●●●●
●●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●●●●●●●
●
●
●
●●
●●
●●
●●●●
●●
●
●●
●●●●●●●●●
●●●●
●●
●
●●●●
●●●●●●
●
●
●●●●
●●
●●
●
●●
●●●●●
●
●●●●●
●
●●●●
●●●●
●●
●●
●●●●
●●
●●
●●●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●
●
●
●●●●
●●
●●
●●
●●
●
●●●●
●
●●●●●●●●
●
●
●
●●
●
●●
●●●
●●●
●
●
●●●●
●●●●
●
●●●●
●●
●
●
●●
●
●
●●
●
●
●●●
●●●
●●
●●
●
●●
●
●●
●
●●●●●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●●●●●
●
●●
●
●●●
●
●
●
●●
●
●●
●●●
●
●
●
●●●●●
●●●●●●
●
●●●
●●
●
●●
●
●●
●
●
●●
●
●●●●●●
●●●●●
●
●
●
●
●
●●
●
●
●●●●●●
●
●●●
●●●●
●
●●●●●●●●●●●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●●
●
●
●●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●●●
●●●●
●
●●
●●
●
●
●●●●
●
●●
●
●●●●
●●●●
●
●●●
●
● ●
●
●
●
●●
●
●●●●
●
●
●●●
●
●●●●
●●
●
●
●●●
●●●●
●●●
●
●●
●●
●●
●●●
●●●●●●●●●●
●
●●●
●●
●
●
●
●
●
●●●●
●●
●
●●●●●●
●●●●●●●●●●●●
●
●
●
●
●
●●●●●
●
●●●●
●●
●
●●
●
●
●●●
●●●●●●●●●
●●
●●
●●●
●
●●●
●
●●
●
●●●●●●
●
●
●●●●●
●
●
●
●●
●●
●●●
●●●●●
●●
●
●
●
●
●●
●●●●
●
●●●●●●●●●●●●
●
●●●●●●
●
●
●
●
●●●
●●●
●
●●
●
●●●●
●
●●
●
●●●●●
●
●
●●
●●
●●●●
●
●
●●
●●●●
●
●
●
●
●
●●●●●●●●●●
●●
●
●
●●
●●●
●
●●●●●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●●●●●●●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●●
●
●●●
●●●
●
●
●
●●●
●
●
●
●●●
●
●●●
●●●●●●●
●
●●●
●●
●●●●●
●
●●
●
●●●●●●
●
●
●●●
●
●
●●●●●●
●
●
●
●●●●●
●
●●●●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●●
●●
●
●●●●
●
● ●
●●
●●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●●●●●
●
●
●●
●
●●●●
●●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●
●
●●●●●
●
●
●●
●
●●
●●●●●●●●●
●●●●●
●
●●
●●
●
●●●
●●●●
●●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●
●●
●●●●●●●●●●●●●
●●
●●●
●
●●●●●●●●
●
●●●
●●
●
●●
●
●●
● ●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●●●●
●●
●●
●
●●●
●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●●●●
●
●
●
●●
●●●●
●
●
●●
●
●
●●●
●
●●●
10_6_5_11
9_6_5_11
purep53
JMS8−2
JMS8−3
JMS8−4
JMS8−5
JMS9−P7c
JMS9−P8c
−5
0
5
10
15
A. Example: Unnormalised data
Log−cpm
●
●
●
●●●●
●
●
●
●●●●●●●●●●
●
●
●
●●●●●●●●●●●●
●
●●●●
●
●●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●●●●●●●
●
●
●
●
●
●●●●●
●●
●
●
●●●●●●●
●●
●
●●●
●
●
●
●
●
●
●●
●●●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●●●●
●
●●
●●●●
●
●
●●●
●●●●
●●
●
●●
●
●
●
●●●
●
●
●●●●●●
●
●
●●●●●●
●
●●●●●●●●●●
●
●●
●
●
●●
●●●●●●
●
●
●●●
●
●●●●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●●●
●●
●●
●
●
●●●●●
●
●
●●●●
●●
●
●
●●●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●●●●●●●
●
●
●
●●
●●
●●
●●●●
●●
●
●●
●●●●●●●●●
●●●●
●●
●
●●●
●
●●●●●●
●
●
●
●●●
●●
●●
●
●●
●●●●●
●
●●●●●
●
●●
●●
●●●●
●●
●●
●●●●
●●
●●
●●●
●
●
●
●
●●●●
●
●●●
●
●
●
●●●●●●
●
●
●●●●
●●
●●
●●
●●
●
●●●●
●
●●●●●●●●
●
●
●
●●
●
●●
●●●
●●●
●
●
●●●●
●●●●
●
●●●●
●●
●
●
●●
●
●
●●
●
●
●●●
●●●
●●
●●
●
●●
●
●●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●●●●●
●
●●
●
●●●
●
●
●
●●
●
●●
●●●
●
●
●
●●●●●
●●●●●●
●
●●●
●●
●
●●
●
●●
●
●
●●
●
●●●●●●
●●●●●
●
●
●
●
●
●●
●
●
●●●●●●
●
●●●
●●●●
●
●●●●●●●●●●●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●●●●
●
●
●●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●●●
●●●●
●
●●
●●
●
●
●●●●
●
●●
●
●●●●
●●●●
●
●●●
●
● ●
●
●
●
●●
●
●●●●
●
●
●●●
●
●●
●●
●●
●
●
●
●●
●●●●
●●●
●
●●
●●
●●
●●●
●●●●●●●●●●
●
●●●
●●
●
●
●
●
●
●●●●
●●
●
●●●●
●
●
●●●●●●●●●●●●
●
●
●
●
●
●●●●●
●
●●●●
●●
●
●●
●
●
●●●
●●●●●●
●●●
●●
●●
●●●
●
●●●
●
●●
●
●●●●●
●
●
●
●●●●●
●
●
●
●●
●●
●●●
●●●●●
●●
●
●
●
●
●●
●●●●
●
●●●●●●●●●●●●
●
●●●●●●
●
●
●
●
●●●
●●●
●
●●
●
●●●●
●
●●
●
●●●●●
●
●
●●
●●
●●●●
●
●
●●
●●●●
●
●
●
●
●
●●●●●●●●●●
●●
●
●
●●
●●●
●
●●●●●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●●●●●●●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●●
●
●●●
●●●
●
●
●
●●●
●
●
●
●●●
●
●●●
●●●●●●●
●
●●●
●●
●●●●●
●
●●
●
●●●●●●
●
●
●●●
●
●
●●●●●●
●
●
●
●●●●●
●
●●●●
●●
●
●
●●
●
●
●
●●●
●
●
●●
●●
●●
●
●●●●
●
● ●
●●
●●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●●●●●
●
●
●●
●
●●●●
●●
●
●
●
●●
●
●
●
●
●●●●●
●
●●
●
●
●
●●●●●
●
●
●●
●
●●
●●●●●●●●●
●●●●●
●
●●
●●
●
●●●
●●●●
●●
●
●
●
●
●
●
●
●
●●●●
●
●●●●
●
●●
●●●●●●●●●●●●●
●●
●●●
●
●●●●●●●●
●
●●●
●●
●
●●
●
●●
● ●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●●●●●
●●
●●
●
●●●
●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●●●●
●
●
●
●●
●●●●
●
●
●●
●
●
●●●
●
●●●
10_6_5_11
9_6_5_11
purep53
JMS8−2
JMS8−3
JMS8−4
JMS8−5
JMS9−P7c
JMS9−P8c
−5
0
5
10
15
B. Example: Normalised data
Log−cpm
![Page 23: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/23.jpg)
Normalisation for composition bias
• TMM normalisation (Robinson and Oshlack, 2010)• How do we make the expression of all the genes go UP in
the one sample? – Scaling factors
• E.g. scale library size by 0.1 so effective library size is 1M.
• Scaling factor <1 makes the CPM larger.
Library size Count CPM
10M 10 1
1M 10 10
![Page 24: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/24.jpg)
Voom
![Page 25: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/25.jpg)
Variance of log-cpm depends on mean of log-cpm
SEQC(tech var only)
Mouse(low bio var)
Simulations(mod bio var)
Nigerian(high bio var)
D melanogaster(systematically dif)
Average log2(count + 0.5) 25
![Page 26: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/26.jpg)
Normal dist. assumes that the data is• continuous• has constant variance
RNA-seq data is• discrete• has non-constant mean-variance trend
✔
Voom• Transform to log-counts per million• Remove mean-var dependence through
the use of precision weights
![Page 27: RNA-seq:filtering,quality control and visualisationcombine-australia.github.io/RNAseq-R/slides/RNASeq_filtering_qc.pdfCPM value that corresponds to a count of 10. • In our dataset,](https://reader034.vdocuments.us/reader034/viewer/2022042212/5eb557e6158f881ba74c94e6/html5/thumbnails/27.jpg)
After: constant variance
Mean log-cpm
Log2
-var
ianc
e
Mean log-count
Qtr-
root
varia
nce
Before: trended variance
Variance weights• Obtain variance estimates for each observation using
mean-var trend.• Assign inverse variance weights to each observation.• Weights remove mean-variance trend from the data.