rna-seq workshop counting & htseq erin osborne nishimura

Post on 04-Jan-2016

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RNA-seq workshopCOUNTING & HTSEQ

Erin Osborne Nishimura

_trim.fastq file

.bam/.sam file

.bw file

counts.txt file

TOPHAT2

bedGraphToBigWig

bedtools genomecov

.bg file

HTseq

DESeq2/R

Differentially AbundantgenesIGV/UCSC

Pretty browser shots

Today’s simple analysis pipeline.fastq file

trimmomatic/bbduk.sh

Quantification with htseq

Quantification with htseq

The problem

Counting reads

• What will we count?• Genes?• Exons?• Isoforms?

• What are some of the issues we need to account for when counting reads?

• Paralogs?• Overlap?• Isoforms?• Errors?

• How to count?• Raw counts• RPKM -- Reads aligned kilobase per million mapped reads• FPKM -- Fragments per kilobase per million mapped reads

The problem

The three htseq-count modes

Switch to hands on tutorial

• https://github.com/erinosb/HTSF_workshop/blob/master/02_RNAseq_count.md

Assessing differential abundance

Assessing pairwise differential abundance, relatively simple

Anders and Huber, 2010

Identifying genes with shared patterns across multiple samples, complex

For today…

Anders and Huber, 2010

Many publications report performance comparisons of the of different packages

• Seyednasrollah et al., 2013

– http://bib.oxfordjournals.org/content/16/1/59.full.pdf+html

• Soneson et al., 2013.• http://www.biomedcentr

al.com/1471-2105/14/91

• Rapaport et al., 2013– http://www.genomebiolog

y.com/2013/14/9/r95

Why is this hard? Why is this different from other types of data?

• Your question• The data

– Discretness– Small numbers of

replicates– Large dynamic range– Outliers– Data is overdispersed

• Variance does not scale linearly with mean

• Breaks the assumptions of some inference tests

Anders and Huber, 2010

Why DESeq?

• Original paperhttp://www.genomebiology.com/content/11/10/R106

• DESeq2 paper• http://www.genomebiology.com/2014/15/12/550

• Bioconductor• http://bioconductor.org/packages/release/bioc/html/DESeq2.ht

ml• Vignette

• https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf

A final word about the fate of your data

• You will need to submit your raw and processed files in a repository PRIOR to submitting your paper for publication.

• Keep track of what you did!– Module Versions– Conversion & transformation steps– Settings/Options

Switch to hands-on tutorial

• https://github.com/erinosb/HTSF_workshop/blob/master/02_RNAseq_count.md

20

Key Quality Control Metrics

• Gene coverage– CEAS

• Over-amplification– FASTQC

• Complexity– TOPHAT output

• Reproducibilitybility

top related