bioinformatics analysis of single-cell sequencing data
TRANSCRIPT
![Page 1: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/1.jpg)
Bioinformatics Analysis of Single-Cell
Sequencing Data
Min Gao, PhD
Assistant Professor of Medicine& UAB Informatics Institute
3.13.2020
![Page 2: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/2.jpg)
• Backgroud
1. Why single cell?
2. Current platforms for scRNA-seq
3. Experimental design
• Single-cell data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4. Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. Multimodal analysis
• Single-cell RNAseq data analysis in U-BRITE2.0 (Zongliang Yue)
Outline
![Page 3: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/3.jpg)
Single-cell analysis reveals heterogeneity J Hematol Oncol. 2017 Jan 21;10(1):27
Why single-cell analysis?
![Page 4: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/4.jpg)
Wang Y.; Navin N. E. Mol. Cell 2015
Single cell analysis affects diverse areas of biological research
![Page 5: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/5.jpg)
Trends Genet. 2015 Oct; 31(10): 576–586.
Single cell analysis in cancer genomics
![Page 6: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/6.jpg)
Single cell analysis in immunology
Trends Immunol. 2017 February ; 38(2): 140–149
![Page 7: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/7.jpg)
Scaling of scRNA-seq experiments
Svennson et al., Nature Protocols, 2018
![Page 8: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/8.jpg)
Platforms Advantage Limitation
Chromium System (10x Genomics)High cell capture efficiency, easy to operate, end-to-end solution,
multiple applications, well established platform, intensive supportHigh initial cell concentration required, no users modification possible
Nadia (Dolomite Bio)Open platform, possibility to develop own protocols, multiple
applications (PACS, DroNc-Seq)
High initial cell concentration required, lower cell capturing efficiency,
no analysis software provided, skills to operate required
InDrop System (1CellBio)High cell capture efficiency, open platform, possibility to develop own
protocols
High initial cell concentration required, no analysis software support,
skills to operate required
Illumina Bio-Rad ddSEQ Single-Cell IsolatorProduct from industry leaders, easy to operate, end-to-end solution, kits
for different starting number of cells
High initial cell concentration required, no users modification possible,
single application (RNA-Seq)
Tapestri Platform (MissionBio)Only platform dedicated to DNA-Seq, easy to operate, customized
panels availableSingle application possible (DNA-Seq)
BD Rhapsody Single-Cell Analysis System (BD)Possibility to optimize costs (subsampling, archiving, targeted assays),
easy to operate, end-to-end solution, protein detection promisedSingle application possible (targeted RNA-Seq)
ICELL8 Single-Cell System (Takara) Combined high throughput with active cell selection, easy to operate Bioinformatics analysis not provided, single application (RNA-Seq)
C1 System and Polaris (Fluidigm)Variable throughput (48–800 cells), multiple applications, customizable
protocols, cell stimulation, well established platform, intensive supportSize-based cell selection (C1)
Puncher Platform (Vycap)Filtering for rare cell capturing, active cell selection, visual control, high
transferring efficiency, easy to operate, established WGA/WTA
protocols
Low throughput, bioinformatics analysis not provided
CellRaft AIR System (CellMicrosystems)Multiple applications (cultivation and tracking cell phenotypes,
substance testing), active cell selection, visual control, high transfer
efficiency, cost-effective manual version available
Low throughput, bioinformatics analysis not provided, adhesive
properties of cells expected (although not mandatory)
DEPArray NxT (Menarini Silicon Biosystems)Active cell selection, visual control, high transfer efficiency, possibility
to study cell–cell interaction, established WGA/WTA protocols
Low throughput, bioinformatics analysis not provided; compared to
other low-throughput instruments, a high price of consumables (chips)
AVISO CellCelector (ALS)Active cell selection, visual control, multiple applications (transfer cell
colonies), low price for consumables
Low throughput, bioinformatics analysis not provided, skills to operate
required, adhesive properties of cells lower transfer efficiency, risk of
contamination from co-transferred medium
Int J Mol Sci. 2018 Mar; 19(3): 807.
Single Cell Platforms
![Page 9: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/9.jpg)
Droplet-based single cell platform
Single Cell Solutions :
Single Cell CNVSingle Cell Gene ExpressionSingle Cell Immune ProfilingSingle Cell ATAC
https://wp.10xgenomics.com/instruments/chromium-controller/Lab Chip. 2019 May 14;19(10):1706-1727
A B
![Page 10: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/10.jpg)
Experimental Design considerations
How deep to sequence?• # of genes saturates around 1 million reads per cell• Identifying different cell types present: 25-50k reads per cell• Identifying transcriptional dynamics within a population: 50-100k reads per cell• We don’t necessarily need to detect everything in every cell!
How many cells?• Two main things to consider1) How many cell types are there?2) What is the proportion of the rarest cell type you’re interested in?• 10x Genomics currently allows for each run to yield anywhere from ~500-10,000 cells• Do you actually know how many cell types are there? What about cell states? The
current trend in the field seems more focused in increasing cell #
What about replicates?• It kind of depends.• Technical replicate of PBMCs has near-perfect overlap• Cells are dramatically different between patients.
![Page 11: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/11.jpg)
Challenges in processing single-cell sequencing data
Variety in and of data is a classic biological problem pertaining also to big data. While there are clear opportunities in bigger volumes of data, there are technical, statistical and interpretative challenges rising alongside.
Basic programming needed to interpret data
The information contained in single-cell data needs to be transformed into relevant biological knowledge
![Page 12: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/12.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4. Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. multimodal analysis
![Page 13: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/13.jpg)
Flowchart of the single-cell RNA-seq analysis
Cell Ranger
Seurat
U-BRITE2.0-Web App
U-BRITE1.0- Jupyter Notebook
https://gitlab.rc.uab.edu/MCBIOS19/single_cell_rnaseq_hands-on_1
https://gitlab.rc.uab.edu/attis2020/single_cell_attis2020
attis2020-scrnaseq-demo.informatics.uab.edu
![Page 14: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/14.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4. Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. multimodal analysis
![Page 15: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/15.jpg)
From raw reads to gene expression matrix
Cook D. Introduction to the single-cell RNA sequencing workflow. 2018
![Page 16: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/16.jpg)
cellranger count --helpcellranger count (3.1.0)Copyright (c) 2019 10x Genomics, Inc. All rights reserved.-------------------------------------------------------------------------------'cellranger count' quantifies single-cell gene expression.The commands below should be preceded by 'cellranger':
Usage:count--id=ID[--fastqs=PATH][--sample=PREFIX]--transcriptome= ref-DIR[options]count <run_id> [options]count -h | --help | --version
The commands for cellranger count
cellranger count --id=run_count_1kpbmcs \--fastqs=/path_to fastq/pbmc_1k_v3_fastqs \--sample=pbmc_1k_v3 \--transcriptome=/path_to_ref/run_cellranger_count/refdata-cellranger-GRCh38-3.0.0
Example
![Page 17: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/17.jpg)
• cellranger count --id=sample345 \--transcriptome=/refdata-cellranger-GRCh38-3.0.0 \--fastqs=/fastq_path \--expect-cells=1000
• Outputs:- Run summary HTML: web_summary.html- Run summary CSV: metrics_summary.csv- BAM: possorted_genome_bam.bam- BAM index: possorted_genome_bam.bam.bai- Filtered feature-barcode matrices MEX: filtered_feature_bc_matrix- Filtered feature-barcode matrices HDF5: filtered_feature_bc_matrix.h5- Unfiltered feature-barcode matrices MEX: raw_feature_bc_matrix- Unfiltered feature-barcode matrices HDF5: raw_feature_bc_matrix_h5.h5- Secondary analysis output CSV: analysis- Per-molecule read information: molecule_info.h5- Loupe Cell Browser file: cloupe.cloupe
The outputs of cellranger count
• filtered_feature_bc_matrix├── barcodes.tsv.gz├── features.tsv.gz└── matrix.mtx.gz
Seurat or other software
![Page 18: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/18.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2.QC and Filtering
3. Normalization
4. Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. multimodal analysis
![Page 19: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/19.jpg)
QC and Filtering
• Goal: Remove low-quality cells and potential doublets
• Common parameters worth exploring
UMI distribution
Number of genes detected
Percent of UMIs aligning to mitochondrial genes
Oddly-high nUMI/nGene could be doublets (~90 doublets per 1000 cells)
High mitochondrial genes is associated with cell death (loss of membrane integrity > cytoplasmic loss > enrichment of mitochondrial content)
![Page 20: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/20.jpg)
Install Seurat3 and load the package
![Page 21: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/21.jpg)
Load PBMC data to Seurat
![Page 22: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/22.jpg)
Cell QC – filter cells
![Page 23: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/23.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3.Normalization
4. Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. multimodal analysis
![Page 24: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/24.jpg)
Normalization
It’s common to “regress out” the effect of nUMI and percent mito on each cell
Goal: Make profiles of each cell comparableSimplest Approach: Scaling library size to some arbitrary value (eg. 10,000)
![Page 25: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/25.jpg)
Data normalization
![Page 26: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/26.jpg)
Detection of highly variable genes
![Page 27: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/27.jpg)
Scaling the data
![Page 28: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/28.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4.Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. Multimodal analysis
![Page 29: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/29.jpg)
Dimensionality Reduction
Goal: To visualize the structure of our data
Principal Component Analysis is one of the techniques used for dimensionality reduction
• Dimensionality simply refers to the number of features (i.e. input variables) in your dataset.• Why reduce the dimensions? Large dimensions are difficult to train on, need more computational power and time. Visualization is not possible with very large dimensional data.
• PCA is a variance maximizer. It projects the original data onto the directions where variance is maximum. Variance is the measure of how spread out the data is.
![Page 30: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/30.jpg)
Perform linear dimensional reduction(PCA)
![Page 31: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/31.jpg)
Visualizing the PCA results
![Page 32: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/32.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4. Dimensionality Reduction
5.Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. Multimodal analysis
![Page 33: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/33.jpg)
Clustering: Assign cells to groups of similar cells
Clustering
Nature Reviews Genetics volume 20, pages273–282(2019)
![Page 34: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/34.jpg)
Cell clustering
![Page 35: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/35.jpg)
Cell clusters
![Page 36: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/36.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4. Dimensionality Reduction
5. Clustering
6.Differential Expression gene / Marker Identification
7. Trajectory analysis
8. Multimodal analysis
![Page 37: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/37.jpg)
Differential Expression gene / Marker Identification
Goal: Find out what’s different between clusters
Lots of options—some complex, some samplePairwise comparisons (eg. Cluster A vs. Cluster B)Marker identification (eg. Cluster A vs. Combined Cluster B,C,D)
Often many genes identified as “significant” due to large number of cells per cluster. May need to apply effect size (eg. Fold change) cutoffs to filter down to a smaller list of things to follow up on
Soneson & Robinson., Nature Methods, 2018
![Page 38: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/38.jpg)
Finding differentially expressed genes (cluster biomarkers)
![Page 39: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/39.jpg)
Visualizing marker genes
![Page 40: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/40.jpg)
Visualizing marker genes
![Page 41: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/41.jpg)
Visualizing marker genes
![Page 42: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/42.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4. Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7.Trajectory analysis
8. Multimodal analysis
![Page 43: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/43.jpg)
Trajectory analysis
Pseudotime is a measure of how much progress an individual cell has made through a process such as cell differentiation.
In single-cell expression studies of processes such as cell differentiation, captured cells might be widely distributed in terms of progress.
Pseudotime is an abstract unit of progress: it's simply the distance between a cell and the start of the trajectory, measured along the shortest path.
Nat Methods. 2017 Oct; 14(10): 979–982.
![Page 44: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/44.jpg)
Single cell sequencing data analysis
1. From raw reads to gene expression matrix
2. QC and Filtering
3. Normalization
4. Dimensionality Reduction
5. Clustering
6. Differential Expression gene / Marker Identification
7. Trajectory analysis
8. Multimodal analysis
![Page 45: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/45.jpg)
Single cell VDJ seq data analysis
Samples : Thousands of single-cells from B cells of healthy control and SLE patients
Method : 10X Genomics - Single Cell Immune Profiling Solution - 5’ Gene Expression + V(D) J Enriched Libraries
Goal: Compare the IG genes of the healthy control and the patients
![Page 46: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/46.jpg)
Cell proportion for different contig numbers in autoAb+ and healthy samples
The cells only had two contigs (one is Heavy chain, the other is Light chain) were used for the following analysis.
![Page 47: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/47.jpg)
IGHV for autoAb+ and healthy samples
autoAb+ : SLE#1, SLE#2 and SLE#3Healthy: HC
![Page 48: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/48.jpg)
Single-cell RNAseq data analysis in U-BRITE2.0 Zongling Yue
![Page 49: Bioinformatics Analysis of Single-Cell Sequencing Data](https://reader031.vdocuments.us/reader031/viewer/2022011905/61d6bc668b44465a9c7be82a/html5/thumbnails/49.jpg)
Acknowledgement
Department of Pathology
Dr. Casey T. Weaver
Division of Clinical Immunology and Rheumatology
Dr. John D. Mountz
Dr. Hui-Chen Hsu
Informatics Institute(UAB) School of Medicine
Zongliang Yue
Jelai Wang
Dr. Jake Chen
Dr. Alexander Rosenberg
Dr. Zechen Chong
Dr. James J. Cimino