rna-seq: experimental design · 1. experimental design technical replicates: illumina has low...
TRANSCRIPT
![Page 1: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/1.jpg)
RNA-seq: experimental design
http://upload.wikimedia.org/wikipedia/commons/0/01/RNA-Seq-alignment.png
![Page 2: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/2.jpg)
Transcriptomics (RNA-Seq)
• The process of sequencing the “transcriptome”
• Uses include –
o Differential Gene Expression
Quantitative evaluation and comparison of transcript levels
o Transcriptome assembly
Building the profile of transcribed regions of the genome, a qualitative evaluation.
o Can be used to help build better gene models, and verify them using the
assembly
o Metatranscriptomics or community transcriptome analysis
![Page 3: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/3.jpg)
Outline
o Library preparation and sequencing with Illumina
o Experimental and Practical Considerations
![Page 4: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/4.jpg)
RNA-Seq library prep Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671–682http://rnaseq.uoregon.edu/#rna-prep
![Page 5: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/5.jpg)
RNA-Seq library prep Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671–682http://rnaseq.uoregon.edu/#rna-prep
![Page 6: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/6.jpg)
Illumina: Sequencing by Synthesis
Library Preparation
DNA (0.1-5.0 µg)
1 2 3 7 8 94 5 6T G C T A C G A T …
C
C
CC
A
A
A
TT
GG
G
G
Sequencing
Single molecule array
Cluster Growth
Image Acquisition Base Calling
5’
5’3’
TG
TA
CG
AT
CA
CC
CG
AT
CG
AA
https://www.youtube.com/watch?v=fCd6B5HRaZ8&t=3s
![Page 7: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/7.jpg)
Illumina: Sequencing by Synthesis
Number of clusters ~= Number of reads
Number of sequencing cycles ~= Length of reads
![Page 8: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/8.jpg)
Illumina: Sequencing Platforms
https://www.illumina.com/systems/sequencing-platforms.html
![Page 9: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/9.jpg)
Other Sequencing Platforms
Oxford Nanopore (MinION): https://nanoporetech.com/
Pacific Biosciences: http://www.pacb.com/
![Page 10: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/10.jpg)
Outline
o Library preparation and sequencing with Illumina
o Experimental and Practical Considerations
![Page 11: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/11.jpg)
1. Experimental Design
2. Poly(A) enrichment or ribosomal RNA depletion?
3. Single-end or Paired-end data?
4. Stranded libraries?
5. How much sequencing data to collect?
6. Multiplexing
Experimental and Practical considerations
![Page 12: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/12.jpg)
1. Experimental design
✦ Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary.
✦ Biological replicates, are absolutely essential. Have at least 3!
✦ Batch effects are still a problem. Be consistent!
✦ For differential gene expression, pooling RNA from multiple biological replicates can be tricky; do so only if you have multiple pools from each experimental condition.
Experimental and Practical considerations
![Page 13: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/13.jpg)
2. Poly(A) enrichment or ribosomal RNA depletion?
Depends on which RNA entities you are interested in…
✦ For differential gene expression, it is best to enrich for Poly(A)+
• EXCEPTION – If you are aiming to obtain information about long non-coding RNAs, then do a ribosomal RNA depletion.
Experimental and Practical considerations
![Page 14: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/14.jpg)
3. Single-end or Paired-end data? Depends on your goals, paired-end reads are better for reads that map to multiple locations, for assemblies and for splice isoform differentiation.
Experimental and Practical considerations
![Page 15: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/15.jpg)
Options for sequencing
✓ SE - Single end dataset => Only Read1
✓ PE - Paired-end dataset => Read1 + Read2 • can be 2 separate FASTQ files or just one with interleaved pairs
Read1
Read2Insert
http://tucf-genomics.tufts.edu/home/faq
![Page 16: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/16.jpg)
Options for sequencing
✓ SE - Single end dataset => Only Read1
✓ PE - Paired-end dataset => Read1 + Read2 • can be 2 separate FASTQ files or just one with interleaved pairs
✓ Fragment length: ~300-500bp
✓ Read length: 50bp - 250bp, depends on the sequencer (HiSeq2500, MiSeq, NextSeq)
Insert
http://tucf-genomics.tufts.edu/home/faq
Read1
Read2
![Page 17: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/17.jpg)
3. Single-end or Paired-end data? Depends on your goals, paired-end reads are better for reads that map to multiple locations, for assemblies, and for splice isoform differentiation.
✦ For differential gene expression, which one you pick depends on-
• If you are specifically interested in isoform-level differences
• The abundance of paralogous genes in your system of interest
• Your budget, paired-end data is usually 2x more expensive
Experimental and Practical considerations
![Page 18: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/18.jpg)
4. Stranded libraries? Stranded libraries are now standard with Illumina’s TruSeq stranded RNA-Seq kits. This means that with a great amount of certainty you can identify which strand of DNA the RNA was transcribed from.
3 types of libraries –
✦ Reverse (firststrand)– reads resemble the complementary sequence (TruSeq) ✦ Unstranded ✦ Forward (secondstrand) – reads resemble the gene sequence
Experimental and Practical considerations
![Page 19: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/19.jpg)
5. How much sequencing data to collect? ✦ Only ~2% of the human genome transcribes protein-coding RNA
✦ Some mRNAs will be much more abundant than others
✦ Some genes are much longer than others
Recommendations:
✦ For human samples ~30-50 million reads/sample (ENCODE guidelines)
✦ Modify that number based on the size of your transcriptome (crude estimate)
✦ If working with a tight budget:
More replicates >> More reads (for standard differential expression analysis)
Experimental and Practical considerations
![Page 20: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/20.jpg)
Experimental and Practical considerations
6. Multiplexing (with barcodes and indices)
✦ Charges for sequencing are usually per lane of the flow cell
✦ Each lane generates ~150 million reads
✦ For RNA-Seq, the required data per sample is much lower than that
✦ Sequencing of multiple samples per lane possible with addition of indices (within the Illumina adapter) or special barcodes (outside the Illumina adapter).
![Page 21: RNA-seq: experimental design · 1. Experimental design Technical replicates: Illumina has low technical variation unlike microarrays, hence technical replicates are unnecessary. Biological](https://reader034.vdocuments.us/reader034/viewer/2022043000/5f77b025ea3685650b65fb48/html5/thumbnails/21.jpg)
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.