rna-seq workshop counting & htseq erin osborne nishimura

20
RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Upload: sandra-powell

Post on 04-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

RNA-seq workshopCOUNTING & HTSEQ

Erin Osborne Nishimura

Page 2: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

_trim.fastq file

.bam/.sam file

.bw file

counts.txt file

TOPHAT2

bedGraphToBigWig

bedtools genomecov

.bg file

HTseq

DESeq2/R

Differentially AbundantgenesIGV/UCSC

Pretty browser shots

Today’s simple analysis pipeline.fastq file

trimmomatic/bbduk.sh

Page 3: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Quantification with htseq

Page 4: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Quantification with htseq

Page 5: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

The problem

Page 6: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Counting reads

• What will we count?• Genes?• Exons?• Isoforms?

• What are some of the issues we need to account for when counting reads?

• Paralogs?• Overlap?• Isoforms?• Errors?

• How to count?• Raw counts• RPKM -- Reads aligned kilobase per million mapped reads• FPKM -- Fragments per kilobase per million mapped reads

Page 8: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

The problem

Page 9: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

The three htseq-count modes

Page 10: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Switch to hands on tutorial

• https://github.com/erinosb/HTSF_workshop/blob/master/02_RNAseq_count.md

Page 11: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Assessing differential abundance

Page 12: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Assessing pairwise differential abundance, relatively simple

Anders and Huber, 2010

Page 13: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Identifying genes with shared patterns across multiple samples, complex

Page 14: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

For today…

Anders and Huber, 2010

Page 15: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Many publications report performance comparisons of the of different packages

• Seyednasrollah et al., 2013

– http://bib.oxfordjournals.org/content/16/1/59.full.pdf+html

• Soneson et al., 2013.• http://www.biomedcentr

al.com/1471-2105/14/91

• Rapaport et al., 2013– http://www.genomebiolog

y.com/2013/14/9/r95

Page 16: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Why is this hard? Why is this different from other types of data?

• Your question• The data

– Discretness– Small numbers of

replicates– Large dynamic range– Outliers– Data is overdispersed

• Variance does not scale linearly with mean

• Breaks the assumptions of some inference tests

Anders and Huber, 2010

Page 17: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Why DESeq?

• Original paperhttp://www.genomebiology.com/content/11/10/R106

• DESeq2 paper• http://www.genomebiology.com/2014/15/12/550

• Bioconductor• http://bioconductor.org/packages/release/bioc/html/DESeq2.ht

ml• Vignette

• https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf

Page 18: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

A final word about the fate of your data

• You will need to submit your raw and processed files in a repository PRIOR to submitting your paper for publication.

• Keep track of what you did!– Module Versions– Conversion & transformation steps– Settings/Options

Page 19: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

Switch to hands-on tutorial

• https://github.com/erinosb/HTSF_workshop/blob/master/02_RNAseq_count.md

Page 20: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura

20

Key Quality Control Metrics

• Gene coverage– CEAS

• Over-amplification– FASTQC

• Complexity– TOPHAT output

• Reproducibilitybility