cccb cloud rna-seq dge analysis
TRANSCRIPT
CCCB RNA-Seq DGE Services on Cloud Platform Center for Cancer Computational Biology (SM822)
Bioinformatics Team
Homepage: https://cccb.dfci.harvard.edu/Twitter: @CCCBseq
Typical Problems with Data Analysis
Have sequencing data generated but... ○ don’t know where to securely store them long term○ uploading to GenePattern or Galaxy for analysis is taking forever○ my bioinformaticians can not process it today○ want to make additional differential expression contrasts○ alignment is taking forever to run○ my exome data is taking forever to run○ don’t know how to work with variant data○ my thousand exome is crushing my bioinformaticians’ HPC server○ I am the bioinformatician and I don’t have the time to do all these analysis!
CCCB Cloud Computing Systems can help!
Advantages of Using Cloud SystemsBy integrating DFCI Google Virtual Private Cloud and Partners Dropbox Enterprise, the CCCB Cloud Systems offer convenient, fast, and secure methods to transfer, analyze, and store large sequence data.
Convenient○ Experimentalists can upload and analyze data on their own anytime○ Simplified large data upload and download processes by connection to Dropbox
Fast○ RNA-Seq analysis can be typically be done within hours from fastq files○ Scalable infrastructure with virtually no computing resource limitation○ Minimal wait time to get data analyzed
Secure○ Google Cloud Platform (GCP) is covered by Google-DFCI BAA to ensure HIPAA compliance
security○ All data can be encrypted with SSL/TLS protocol during transfer○ Partners’ Dropbox Business can be used as a storage solution for secure and long term data
archive
Important accounts and where to get them
DFCI G Suite Account (or just Google Account)Google accounts linked with organization emails are prefered even though any google account can be used. For DFCI community, please request an DFCI google account ([email protected]) through Research Computing website: http://rc.dfci.harvard.edu/contact-research-computing
Partners DropboxAll Dropbox account will work with our systems. Partners Health provides virtually unlimited encrypted storage on Dropbox Business for all Partners community members (anyone with partners.org email) for free. Information is available here: https://rc.partners.org/kb/collaboration/dropbox?article=2062
Agilent CrossLab (a.k.a iLab Solutions)As most of cores and centers around DFCI, we use iLab to track all of our projects. A free account can be requested at https://dfci.ilab.agilent.com/account/login
DFCI Virtual Private Cloud and Partners Dropbox
Users
CCCB Bioinformatics
CCCB Sequencing
CCCB Data Analysis and Visualization Infrastructure
Analysis Portal
Local Drive Dropbox
Unlimited space via PartnersUsers
CCCB via DFCI GCP
GATKAnalysis
RNASeqAnalysis
Variant Viewer WebMeV
UploadDownloadWeb AccessDirect data transferUnder construction
RNA-Seq: What’s happening?- Parallelized:
- alignment (STAR aligner) ---> BAM Files
- Sort, primary-alignment filtering, duplicate evaluation (Samtools, Picard)
- Quantification (featureCounts)
- Merging:- Overall “raw” (not normalized) count
matrix
- Differential expression testing with DESeq2
- Plots/figures
Master
Sample 1
Sample 2
Sample N
Simple Fastq file upload system
Sample names are inferred from sequencing file names. Can create new samples or remove existing ones.
- Drag/drop files to the proper sample
Straightforward differential analysis
Processed samples available
Human-readable contrast name
Thresholds used for creating heatmaps and volcano plots
Drag/drop samples into contrast groups
Can rename groups
Fast download for output files using Dropbox
Save output by direct download or Dropbox transfer
- Authenticated: only those logged-in as your Google user can access files
- Direct transfer to Dropbox storage for fast data transfer and backup
Standard RNA-Seq DGE OutputCustom report
Basic figures
Output filesRaw counts, normalized counts,Differential expression results
More advanced analysisBroad Institute GSEA (http://software.broadinstitute.org/gsea/)
Directly use the normalized count matrix file and groups.cls from CCCB Cloud Platform DGE analysis result support files that can be imported into Broad Gene Set Enrichment Analysis (GSEA) on MSigDB
Costs for Basic RNA-Seq and Exome Analysis
Example Costs for DFCI/BWH Members:20 SR75bp samples for RNA-Seq (DGE): $145 + $15*20 = $44520 PE75bp samples for Germline Variant Analysis: $145 + $50*20 = $1,145
- with Variant Annot and Visualization: $1,145 + $20*20 = $1,740
DFCI/BWH External non-profit
Project Setup Per Project $145 $189
RNA-Seq (DGE) Per Sample $15 $18
Germline Variant Analysis Per Exome $50 $60
Variant Annotation and Visualization Per Exome $20 $24
WebMeV free free
Request Project and Demo Accounts
Individuals can now request free demo accounts for
- RNA-Seq DGE pipeline on 6 single read samples - Variant Visualization Platform System for hg19 chr20 from the 1000 Genome
Project
Please send request by emailing [email protected] with a proper Google account with subject line: [Demo] RNA-Seq DGE or [Demo] Variant Visualization