cccb cloud rna-seq dge analysis

14
CCCB RNA-Seq DGE Services on Cloud Platform Center for Cancer Computational Biology (SM822) Bioinformatics Team Homepage: https://cccb.dfci.harvard.edu/ Twitter: @CCCBseq

Upload: yaoyu-wang

Post on 21-Jan-2018

30 views

Category:

Science


3 download

TRANSCRIPT

Page 1: CCCB Cloud RNA-Seq DGE analysis

CCCB RNA-Seq DGE Services on Cloud Platform Center for Cancer Computational Biology (SM822)

Bioinformatics Team

Homepage: https://cccb.dfci.harvard.edu/Twitter: @CCCBseq

Page 2: CCCB Cloud RNA-Seq DGE analysis

Typical Problems with Data Analysis

Have sequencing data generated but... ○ don’t know where to securely store them long term○ uploading to GenePattern or Galaxy for analysis is taking forever○ my bioinformaticians can not process it today○ want to make additional differential expression contrasts○ alignment is taking forever to run○ my exome data is taking forever to run○ don’t know how to work with variant data○ my thousand exome is crushing my bioinformaticians’ HPC server○ I am the bioinformatician and I don’t have the time to do all these analysis!

CCCB Cloud Computing Systems can help!

Page 3: CCCB Cloud RNA-Seq DGE analysis

Advantages of Using Cloud SystemsBy integrating DFCI Google Virtual Private Cloud and Partners Dropbox Enterprise, the CCCB Cloud Systems offer convenient, fast, and secure methods to transfer, analyze, and store large sequence data.

Convenient○ Experimentalists can upload and analyze data on their own anytime○ Simplified large data upload and download processes by connection to Dropbox

Fast○ RNA-Seq analysis can be typically be done within hours from fastq files○ Scalable infrastructure with virtually no computing resource limitation○ Minimal wait time to get data analyzed

Secure○ Google Cloud Platform (GCP) is covered by Google-DFCI BAA to ensure HIPAA compliance

security○ All data can be encrypted with SSL/TLS protocol during transfer○ Partners’ Dropbox Business can be used as a storage solution for secure and long term data

archive

Page 4: CCCB Cloud RNA-Seq DGE analysis

Important accounts and where to get them

DFCI G Suite Account (or just Google Account)Google accounts linked with organization emails are prefered even though any google account can be used. For DFCI community, please request an DFCI google account ([email protected]) through Research Computing website: http://rc.dfci.harvard.edu/contact-research-computing

Partners DropboxAll Dropbox account will work with our systems. Partners Health provides virtually unlimited encrypted storage on Dropbox Business for all Partners community members (anyone with partners.org email) for free. Information is available here: https://rc.partners.org/kb/collaboration/dropbox?article=2062

Agilent CrossLab (a.k.a iLab Solutions)As most of cores and centers around DFCI, we use iLab to track all of our projects. A free account can be requested at https://dfci.ilab.agilent.com/account/login

Page 5: CCCB Cloud RNA-Seq DGE analysis

DFCI Virtual Private Cloud and Partners Dropbox

Users

CCCB Bioinformatics

CCCB Sequencing

Page 6: CCCB Cloud RNA-Seq DGE analysis

CCCB Data Analysis and Visualization Infrastructure

Analysis Portal

Local Drive Dropbox

Unlimited space via PartnersUsers

CCCB via DFCI GCP

GATKAnalysis

RNASeqAnalysis

Variant Viewer WebMeV

UploadDownloadWeb AccessDirect data transferUnder construction

Page 7: CCCB Cloud RNA-Seq DGE analysis

RNA-Seq: What’s happening?- Parallelized:

- alignment (STAR aligner) ---> BAM Files

- Sort, primary-alignment filtering, duplicate evaluation (Samtools, Picard)

- Quantification (featureCounts)

- Merging:- Overall “raw” (not normalized) count

matrix

- Differential expression testing with DESeq2

- Plots/figures

Master

Sample 1

Sample 2

Sample N

Page 8: CCCB Cloud RNA-Seq DGE analysis

Simple Fastq file upload system

Sample names are inferred from sequencing file names. Can create new samples or remove existing ones.

- Drag/drop files to the proper sample

Page 9: CCCB Cloud RNA-Seq DGE analysis

Straightforward differential analysis

Processed samples available

Human-readable contrast name

Thresholds used for creating heatmaps and volcano plots

Drag/drop samples into contrast groups

Can rename groups

Page 10: CCCB Cloud RNA-Seq DGE analysis

Fast download for output files using Dropbox

Save output by direct download or Dropbox transfer

- Authenticated: only those logged-in as your Google user can access files

- Direct transfer to Dropbox storage for fast data transfer and backup

Page 11: CCCB Cloud RNA-Seq DGE analysis

Standard RNA-Seq DGE OutputCustom report

Basic figures

Output filesRaw counts, normalized counts,Differential expression results

Page 12: CCCB Cloud RNA-Seq DGE analysis

More advanced analysisBroad Institute GSEA (http://software.broadinstitute.org/gsea/)

Directly use the normalized count matrix file and groups.cls from CCCB Cloud Platform DGE analysis result support files that can be imported into Broad Gene Set Enrichment Analysis (GSEA) on MSigDB

Page 13: CCCB Cloud RNA-Seq DGE analysis

Costs for Basic RNA-Seq and Exome Analysis

Example Costs for DFCI/BWH Members:20 SR75bp samples for RNA-Seq (DGE): $145 + $15*20 = $44520 PE75bp samples for Germline Variant Analysis: $145 + $50*20 = $1,145

- with Variant Annot and Visualization: $1,145 + $20*20 = $1,740

DFCI/BWH External non-profit

Project Setup Per Project $145 $189

RNA-Seq (DGE) Per Sample $15 $18

Germline Variant Analysis Per Exome $50 $60

Variant Annotation and Visualization Per Exome $20 $24

WebMeV free free

Page 14: CCCB Cloud RNA-Seq DGE analysis

Request Project and Demo Accounts

Individuals can now request free demo accounts for

- RNA-Seq DGE pipeline on 6 single read samples - Variant Visualization Platform System for hg19 chr20 from the 1000 Genome

Project

Please send request by emailing [email protected] with a proper Google account with subject line: [Demo] RNA-Seq DGE or [Demo] Variant Visualization