rna-seq - dgist · 2017. 7. 14. · • as input, the deseq2 package expects count data as...
TRANSCRIPT
![Page 1: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/1.jpg)
RNA-seqDifferential analysis
![Page 2: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/2.jpg)
DESeq2
![Page 3: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/3.jpg)
DESeq2
http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
![Page 4: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/4.jpg)
Input data
![Page 5: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/5.jpg)
Why un-normalized counts?
• As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the 𝑖𝑖-th row and the 𝑗𝑗-th column of the matrix tells how many reads can be assigned to gene 𝑖𝑖 in sample 𝑗𝑗.
• The values in the matrix should be un-normalized counts or estimated counts of sequencing reads (for single-end RNA-seq) or fragments (for paired-end RNA-seq).
• It is important to provide count matrices as input for DESeq2’s statistical model (Love, Huber, and Anders 2014) to hold, as only the count values allow assessing the measurement precision correctly.
![Page 6: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/6.jpg)
Sample annotation
![Page 7: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/7.jpg)
htseq-count input
• You can use the function DESeqDataSetFromHTSeqCount if you have used htseq-count from the HTSeq python package.
![Page 8: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/8.jpg)
htseq-count input
![Page 9: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/9.jpg)
Pre-filtering
• There are two reasons which make pre-filtering useful: by removing rows in which there are no reads or nearly no reads, we reduce the memory size of the dds data object and we increase the speed of the transformation and testing functions within DESeq2. Here we perform a minimal pre-filtering to remove rows that have only 0 or 1 read.
![Page 10: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/10.jpg)
Factor levels
• A factor is a type of vector for which the elements are categorical values.
![Page 11: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/11.jpg)
Factor levels
![Page 12: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/12.jpg)
Factor levels: reordering
![Page 13: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/13.jpg)
Factor levels: relabeling
![Page 14: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/14.jpg)
Factor levels
• By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels.
• Setting the factor levels can be done in two ways, either using factor, or using relevel, just specifying the reference level.
![Page 15: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/15.jpg)
Factor levels
![Page 16: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/16.jpg)
Differential expression analysis
![Page 17: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/17.jpg)
DESeq()
• The standard differential expression analysis steps are wrapped into a single function, DESeq.
![Page 18: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/18.jpg)
results()
• Results tables are generated using the function results, which extracts a results table with log2 fold changes, p values and adjusted p values.
• With no additional arguments to results, the log2 fold change and Wald test p value will be for the last variable in the design formula, and if this is a factor, the comparison will be the last level of this variable over the first level.
• However, the order of the variables of the design do not matter so long as the user specifies the comparison using the name or contrast arguments of results.
![Page 19: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/19.jpg)
results()
• Details about the comparison are printed to the console, above the results table. The text, condition ZIKA vs MOCK, tells you that the estimates are of the logarithmic fold change log2(ZIKA/MOCK).
![Page 20: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/20.jpg)
Ordering results
• We can order our results table by the smallest adjusted p value:
![Page 21: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/21.jpg)
More info on results
• Information about which variables and tests were used can be found by calling the function mcols on the results object.
![Page 22: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/22.jpg)
Note on p-values set to NA
• Some values in the results table can be set to NA for one of the following reasons:
• If within a row, all samples have zero counts, the baseMean column will be
zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
• If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance.
• If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.
![Page 23: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/23.jpg)
How many adjusted p-values?
• How many adjusted p-values were less than 0.1?
![Page 24: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/24.jpg)
summary()
• We can summarize some basic tallies using the summary function.
![Page 25: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/25.jpg)
summary()
• Note that the results function automatically performs independent filtering based on the mean of normalized counts for each gene, optimizing the number of genes which will have an adjusted p value below a given FDR cutoff, alpha.
• By default the argument alpha is set to 0.1. If the adjusted p value cutoff will be a value other than 0.1, alpha should be set to that value.
![Page 26: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/26.jpg)
summary()
![Page 27: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/27.jpg)
Exploring results
![Page 28: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/28.jpg)
MA-plot
• In DESeq2, the function plotMA shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples.
• Points will be colored red if the adjusted p value is less than alpha. Points which fall out of the window are plotted as open triangles pointing either up or down.
![Page 29: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/29.jpg)
MA-plot
![Page 30: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/30.jpg)
MA-plot
• It is also useful to visualize the MA-plot for the shrunken log2 fold changes, which remove the noise associated with log2 fold changes from low count genes without requiring arbitrary filtering thresholds.
• Here we provide the dds object and the number of the coefficient we want to moderate.
![Page 31: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/31.jpg)
MA-plot
![Page 32: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/32.jpg)
MA-plot
![Page 33: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/33.jpg)
plotCounts()
• It can also be useful to examine the counts of reads for a single gene across the groups. A simple function for making this plot is plotCounts, which normalizes counts by sequencing depth and adds a pseudocount of 1/2 to allow for log scale plotting. The counts are grouped by the variables in intgroup, where more than one variable can be specified.
• Here we specify the gene which had the smallest p value from the results table created above. You can select the gene to plot by rowname or by numeric index.
![Page 34: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/34.jpg)
plotCounts()
![Page 35: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/35.jpg)
Exporting results to CSV files
• A plain-text file of the results can be exported using the base R functions write.csv.
![Page 36: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/36.jpg)
Design of experiments
![Page 37: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/37.jpg)
Statistical design of experiments
The process of planning the experiment so that appropriate data will be collected and analyzed by statistical methods, resulting in valid and objective conclusions.
![Page 38: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/38.jpg)
Explanatory and response variables
𝑋𝑋 𝑌𝑌
- Response variables- Explanatory variables- Factors
![Page 39: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/39.jpg)
Factors
𝑋𝑋 𝑌𝑌
𝑍𝑍
Response variablesTreatment factor or design factor
Levels: 𝑋𝑋 = 𝑥𝑥Treatment combination or treatment: a particular combination of factor levels(e.g. 𝑥𝑥1, 𝑥𝑥2 if there are two treatment factors)
- Noise factor- Blocking factor
![Page 40: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/40.jpg)
Three basic principles of experimental design• Randomization
• Replication
• Blocking
![Page 41: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/41.jpg)
Randomization
By randomization we mean that both the assignment of treatments to units and the order in which the individual runs of the experiments are to be performed are randomly determined.
A completely randomized design is an experimental design in which treatments are assigned to all units by randomization.
![Page 42: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/42.jpg)
Replication
By replication we mean an independent repeat run of each treatment combination.
![Page 43: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/43.jpg)
Experimental units
• An entity receiving an independent application of a treatment is called an experimental unit.
• An experimental run is the process of applying a particular treatment combination to an experimental unit and recording its response.
• A replicate is an independent run carried out on a different experimental unit under the same conditions.
![Page 44: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/44.jpg)
Example: Two pots
Experimental unit: plant on the potNo replication
![Page 45: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/45.jpg)
Example: Randomized
Experimental unit: plant on the pot4 replicates for each treatment
![Page 46: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/46.jpg)
Blocking
Blocking is an experimental design strategy used to reduce or eliminate the variability transmitted from nuisance factors, which may influence the response variable but in which we are not directly interested.
Blocking is the grouping of experimental units that have similar properties. Within each block, treatments are randomly assigned to experimental units. The resulting design is called a randomized block design. This design enables more precise estimates of the treatment effects because comparisons between treatments are made among homogeneous experimental units in each block.
![Page 47: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/47.jpg)
Blocking
𝑋𝑋 𝑌𝑌
𝑍𝑍
![Page 48: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/48.jpg)
Blocking example
Blocking removes the variation in response among chambers, allowing more precise estimates and more powerful tests of the treatment effects.
![Page 49: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/49.jpg)
Blinding
The process of concealing information from participants and researchers about which of them receive which treatments is called blinding.
Single-blind experiment: participants area unaware of the treatment they have been assigned. It prevents participants from responding differently according to their knowledge of their treatment.
Double-blind experiment: researchers administering the treatments and measuring the response are also unaware of which subjects are receiving which treatment.
![Page 50: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/50.jpg)
Factorial design
Many experiments in biology investigate more than one treatment factor, because:
1. answering two questions from a single experiment rather than just one makes more efficient use of time, supplies, and other costs
2. the factors might interact.
![Page 51: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/51.jpg)
Factorial design
An experiment having a factorial design investigates all treatment combinations of two or more treatment factors. A factorial design can measure interactions between factors.
An interaction between two (or more) explanatory variables means that the effect of one variable on the response depends on the state of the other variable.
![Page 52: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/52.jpg)
Factorial design
𝑋𝑋1 𝑌𝑌
𝑋𝑋2
![Page 53: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/53.jpg)
A unified model: general linear model
𝐸𝐸[𝑦𝑦] = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥1 + ⋯+ 𝛽𝛽𝑝𝑝−1𝑥𝑥𝑝𝑝−1
![Page 54: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/54.jpg)
Basic linear models
Model formula Model Design
𝑦𝑦~𝑥𝑥 Linear regression Dose-response
𝑦𝑦~t One-way ANOVA Completely randomized
𝑦𝑦~t + b Two-way ANOVA Randomized block
𝑦𝑦~t1 + t2 + t1t2 Two-way, fixed-effect ANOVA
Factorial design
𝑦𝑦~𝑡𝑡 + 𝑥𝑥 ANCOVA Observation study with one known noise factor
𝑦𝑦~𝑥𝑥1 + 𝑥𝑥2 + 𝑥𝑥1𝑥𝑥2 Multiple linear regression
Dose-response
𝑥𝑥: numerical, t: categorical treatment factor, b: categorical blocking factor
![Page 55: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/55.jpg)
Randomized complete block design
How does fish abundance affects the abundance and diversity of prey species?
![Page 56: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/56.jpg)
Design
30 fish 90 fish
Control Low High
3𝑚𝑚 × 3𝑚𝑚
![Page 57: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/57.jpg)
Data: Zooplankton diversity in three fish abundance treatments
1 2 3 4 5
Control 4.1 3.2 3.0 2.3 2.5
Low 2.2 2.4 1.5 1.3 2.6
High 1.3 2.0 1.0 1.0 1.6
![Page 58: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/58.jpg)
Model: 𝑦𝑦~t + b
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑡𝑡𝑖𝑖 + 𝛽𝛽2bi + 𝜖𝜖𝑖𝑖
H0: Mean zooplankton diversity is the same in every abundance treatment
𝑦𝑦~b
H1: Mean zooplankton diversity is not the same in every abundance treatment
𝑦𝑦~t + b
![Page 59: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/59.jpg)
Fitting the model to data
![Page 60: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/60.jpg)
Adjusting for a known confounding factor
![Page 61: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/61.jpg)
Adjusting for a known confounding factorMole rats are the only known mammals with distinct social castes. - A single queen and a small number of males are the only reproducing individuals in a colony.
- Workers gather food, defend the colony, care for the young, and maintain the burrows.
- Two worker castes in the Damaraland mole rat: - “Frequent workers”: do almost all of the work in the colony - “Infrequent workers”: do little work except on rare occasions after rains
![Page 62: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/62.jpg)
Adjusting for a known confounding factorTo assess the physiological differences between the two types of workers, researchers compared daily energy expenditures of wild mole rats during a dry season.
Known noise factor: Energy expenditure appears to vary with body mass in both groups, but infrequent workers are heavier than frequent workers
Research question: How different is mean daily energy expenditure between the two groups when adjusted for differences in body mass?
![Page 63: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/63.jpg)
Data
![Page 64: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/64.jpg)
Data
![Page 65: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/65.jpg)
Model: 𝑦𝑦~𝑡𝑡 + 𝑥𝑥H0: Castes do not differ in energy expenditure
𝑦𝑦~𝑥𝑥
H1: Castes differ in energy expenditure
𝑦𝑦~𝑡𝑡 + 𝑥𝑥
![Page 66: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/66.jpg)
Fitting the model to data
![Page 67: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/67.jpg)
Multi-factor designs
![Page 68: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/68.jpg)
Multiple factors
• Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables.
• In fact, DESeq2 can analyze any possible experimental design that can be expressed with fixed effects terms (multiple factors, designs with interactions, designs with continuous variables, splines, and so on are all possible).
• By adding variables to the design, one can control for additional variation in the counts. For example, if the condition samples are balanced across experimental batches, by including the batch factor to the design, one can increase the sensitivity for finding differences due to condition. There are multiple ways to analyze experiments when the additional variables are of interest and not just controlling factors.
![Page 69: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/69.jpg)
Including type
![Page 70: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/70.jpg)
Accounting for type
• We can account for the different types of sequencing, and get a clearer picture of the differences attributable to the treatment.
• As condition is the variable of interest, we put it at the end of the formula.
• Thus the results function will by default pull the condition results unless contrast or name arguments are specified. Then we can re-run DESeq.
![Page 71: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/71.jpg)
Accounting for type
![Page 72: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/72.jpg)
Accounting for type
![Page 73: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/73.jpg)
Accounting for type
![Page 74: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/74.jpg)
Accounting for type
• It is also possible to retrieve the log2 fold changes, p values and adjusted p values of the type variable. The contrast argument of the function results takes a character vector of length three: the name of the variable, the name of the factor level for the numerator of the log2 ratio, and the name of the factor level for the denominator.
![Page 75: RNA-seq - DGIST · 2017. 7. 14. · • As input, the DESeq2 package expects count data as obtained, e.g., from RNA -seq or another high- throughput sequencing experiment, in the](https://reader036.vdocuments.us/reader036/viewer/2022071218/6050525761772b4f052c43d1/html5/thumbnails/75.jpg)
Accounting for type