rnaseq ngs analysis - dtu bioinformatics€¦ · 2 dtu sytems biology, technical university of...
TRANSCRIPT
RNA-sequencingNext Generation sequencing analysis 2016
Anne-Mette BjerregaardCenter for biological sequence analysis (CBS)
17/04/2008Presentation name2 DTU Sytems Biology, Technical University of Denmark
Terms and definitions
TRANSCRIPTOME
The full set of RNA transcripts and their associated abundance in a sample
RNA-seq
High-throughput sequencing technology used for probing the transcriptome of a sample
EXPRESSION
17/04/2008Presentation name4 DTU Sytems Biology, Technical University of Denmark
Library preparationDiffers primarily from the standard sequencing protocol by an additional reverse transcription step (RNA->cDNA)
RNA-seq Protocol
Martin et al., Nature Genetics Review (2011)
Poly-A selection
17/04/2008Presentation name5 DTU Sytems Biology, Technical University of Denmark
RNA-seq: Strand-specific library preparationThe standard library preparation protocol do not preserve information about which strand was originally transcribed
Strand-specific sequencing (e.g. dUTP method)
Martin et al., Nature Genetics Review (2011)
Levin et al., Nature Methods (2010)
17/04/2008Presentation name6 DTU Sytems Biology, Technical University of Denmark
More terms
Gene
Transcripts
Wiki: https://en.wikipedia.org/wiki/Alternative_splicing
17/04/2008Presentation name8 DTU Sytems Biology, Technical University of Denmark
Data analysis
Martin et al., Nature Genetics Review (2011)
FastQC
CutAdapt
Trimming
17/04/2008Presentation name9 DTU Sytems Biology, Technical University of Denmark
Assemble into transcripts
Reads Transcripts
17/04/2008Presentation name10 DTU Sytems Biology, Technical University of Denmark
Transcriptome assembly strategies
• Reference-based
• De novo
• Combined
• Pseudoalignment
17/04/2008Presentation name11 DTU Sytems Biology, Technical University of Denmark
Reference-based assembly
Align reads
Traverse graph
Graph
Assembl
17/04/2008Presentation name12 DTU Sytems Biology, Technical University of Denmark
Alignment tools
N. Bray et al., Nature Biotechnology (2016)
• NGS common alignment program: – BWA – Bowtie (Bowtie2) – Novoalign
• Take into account splice-junction – Tophat / Cufflinks
17/04/2008Presentation name13 DTU Sytems Biology, Technical University of Denmark
De novo assembly
Kmers
De Bruijn graph
Collapse
Traverse graph
Assemble
17/04/2008Presentation name14 DTU Sytems Biology, Technical University of Denmark
De novo assembly tools
N. Bray et al., Nature Biotechnology (2016)
• Velvet – Genomic and transcriptomic
• Trinity – Transcriptomic
• Cufflinks – Transcriptominc, reassemble pre-aligned transcripts to find
alternative splicing based on differential expression
17/04/2008Presentation name16 DTU Sytems Biology, Technical University of Denmark
Pseudoalignment - Kallisto
N. Bray et al., Nature Biotechnology (2016)
17/04/2008Presentation name17 DTU Sytems Biology, Technical University of Denmark
Advantages and disadvantages
Novel transcripts
Existing reference Fast
Trans-splicedgenes
Reference-based ✗ ✓ ✓ ✗
De Novo ✓ ✗ ✗ ✓
Pseudo ✗ ✓ ✓ ✗
17/04/2008Presentation name18 DTU Sytems Biology, Technical University of Denmark
Expression analysis
17/04/2008Presentation name19 DTU Sytems Biology, Technical University of Denmark
Expression analysis
Normalized expressionTranscripts
17/04/2008Presentation name20 DTU Sytems Biology, Technical University of Denmark
Within sample normalizationCompare expression levels of different transcripts / genes within the same sample.
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Gene 1
Gene 2
Gene 3
17/04/2008Presentation name21 DTU Sytems Biology, Technical University of Denmark
Within sample normalization• RPKM (Reads Per Kilobase Million)
– Single end reads • FPKM (Fragments Per Kilobase Million)
– Paired end reads
• TPM (Transcripts Per Million)
1. Sequencing depth (million)
2. Transcript length (kilobase)
1. Transcript length
2. Sequencing depth
1015 8 10 1010
RPKM / FPKM TPM
17/04/2008Presentation name22 DTU Sytems Biology, Technical University of Denmark
Between sample normalization (BSN)Compare expression levels of the same transcript between samples
0
1
2
3
4
5
6
Patient 1 before
Patient 1 treatet
Patient 2 before
Patient 2 treatet
Gene A
Gene B
Gene C
17/04/2008Presentation name23 DTU Sytems Biology, Technical University of Denmark
HTSeq-count
Simply counts the number of reads that map to each feature
• Feature: Can be a gene or transcript annotation
• Overlapping features: Union or intersection
17/04/2008Presentation name24 DTU Sytems Biology, Technical University of Denmark
Statistics
http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm
Poisson and neg. binomial parameter
Additional negative binomial parameter. when overdispersion = 0 neg. binom = Poisson
17/04/2008Presentation name25 DTU Sytems Biology, Technical University of Denmark
DeSeq2Differential gene expression analysis based on the negative binomial distribution
• Input: Read count tables (HTSeq)• Output: Table containing statistics for whether a gene is differential
expressed between two conditions
17/04/2008Presentation name27 DTU Sytems Biology, Technical University of Denmark
Within sample normalization – more details?1. http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/
2. https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/