computational methods for analysis of single cell rna-seq data ion măndoiu computer science &...

61
Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut [email protected]

Upload: willis-carson

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Computational Methods for Analysis of Single Cell RNA-Seq Data

Ion MăndoiuComputer Science & Engineering Department

University of [email protected]

Page 2: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Outline

• Intro to RNA-Seq– Next-generation sequencing technologies– RNA-Seq applications– Analysis challenges for single cell data

• Typical analysis pipeline for single-cell RNA-Seq– Primary analysis: reads QC, mapping, and quantification– Secondary analysis: cells QC, normalization, clustering, and

differential expression– Tertiary analysis: functional annotation

• Conclusions

Page 3: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

2nd Gen. Sequencing: Illumina

Page 4: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

2nd Gen. Sequencing: Illumina

Page 5: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

• ION Torrent uses a high-density array of micro-machined wells to perform this biochemical process in a massively parallel way• Each well holds a different DNA template generated by emulsion PCR. Beneath the wells is an ion-sensitive layer and beneath that a proprietary ION sensor• The sequencer sequentially floods the chip with one nucleotide after another; in each cycle the voltage change recorded at a well is proportional to the number of incorporated bases

2nd Gen. Sequencing: ION Torrent

Page 6: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

6

3rd Gen. Sequencing: PacBio SMRT

Page 7: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

3rd Gen. Sequencing: PacBio SMRT

Page 8: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

3rd Gen. Sequencing: Oxford Nanopore

http://www.technologyreview.com/article/427677/nanopore-sequencing/

Page 9: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Standard (Bulk) RNA-Seq

Reverse transcribe into cDNA & shatter into fragments

Sequence fragment ends

A B C D E

Map reads

Gene expression quantification

Isoform expressionquantification

A B C

A C

D E

Transcriptome reconstruction

AAAAAA

AAAAAAAAAAAA

Page 10: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Alternative Splicing

Pal S. et all , Genome Research, June 2011

Page 11: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Transcriptome Reconstruction

Page 12: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Common Approaches

• De novo (genome independent reconstruction)– Trinity, Oases, TransABySS

• de Brujin k-mer graph

• Genome guided– Scripture

• Reports “all” transcripts

– Cufflinks, IsoLasso, SLIDE• Minimize set of transcripts explaining reads

• Annotation guided– RABT

• Simulate reads from annotated transcripts

Page 13: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

1 742 3 65t1 :

1 743 65t2 :

1 742 3 5t3 :

t4 :1 743 5

1 742 3 65

Genome-Guided Transcriptome Reconstruction – Multiple Solutions

Page 14: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Which Solution is Most Likely?• TRIP: select smallest set of transcripts with good

statistical fit between fragment length distribution– empirically determined during library preparation– implied by “mapping” read pairs

1 3

1 2 3

500

300

200 200 200

200 200

Series1

Series1

Page 15: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

TRIP Results

• 100x coverage, 2x100bp pe reads; annotations for genes

FPTP

TPPPV

SensPPV

SensPPVFScore

2

FNTP

TPSens

Page 16: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Why Single Cell RNA-Seq?

Macaulay and Voet, PLOS Genetics, 2014

Page 17: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Challenges

• Low RNA input + low RT efficiency– Especially problematic for low expression genes

Macaulay and Voet, PLOS Genetics, 2014

Page 18: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Challenges

• Stochastic effects (e.g., transcriptional bursting) hard to distinguish from regulated transcriptional heterogeneity

• PCR amplification bias results in distortion of transcript abundances

Page 19: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

SMARTer RNA-Seq Protocol

Page 20: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Islam et al. http://www.nature.com/nmeth/journal/v11/n2/full/nmeth.2772.html

Correcting PCR Bias using UMIs (STRT-C1)

Page 21: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Outline

• Intro to RNA-Seq– RNA-Seq applications– Analysis challenges for single cell data

• Typical analysis pipeline for single-cell RNA-Seq– Primary analysis: reads QC, mapping, and quantification– Secondary analysis: cells QC, normalization, clustering,

and differential expression– Tertiary analysis: functional annotation

• Conclusions

Page 22: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 710

0.5

1

1.5

2

2.5

Lane 1 Lane 2

Lane 3

Read position

Perc

enta

ge o

f rea

ds w

ith m

ism

atch

es

Page 23: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Tools to analyze and preprocess fastq files• FASTX (http://hannonlab.cshl.edu/fastx_toolkit/)

– Charts quality statistics– Filters sequences based on quality– Trims sequences based on quality– Collapses identical sequences into a single sequence

• PRINSEQ (http://prinseq.sourceforge.net/)– Generates read length and quality statistics– Filters reads based on length, quality, GC content

and other criteria– Trims reads based on length/position or quality

scores

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 24: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 25: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 26: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

http://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 27: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

RNA-Seq read mapping strategies:– Ungapped mapping (with mismatches) to genome

• Cannot align reads spanning exon-junctions

– Local alignment (Smith-Waterman) to genome• Very slow

– Spliced alignment to genome• Computationally harder than ungapped alignment, but much

faster than local alignment

– Mapping on transcript libraries• Fastest, but cannot align reads from un-annotated transcripts

– Mapping on exon-exon junction libraries• Cannot align reads overlapping un-annotated exons

– Hybrid approaches

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 28: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Comparison of spliced read mapping tools

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Kim et al. http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3317.html

Page 29: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

• Cannot use raw read counts (why not?)

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Islam et al. http://www.nature.com/nmeth/journal/v11/n2/full/nmeth.2772.html

Page 30: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

• CPM = count per million– Ignores multireads underestimates expression of genes in large families

– Does not normalize for gene length cannot compare CPMs b/w genes

– Comparing CPMs between samples assumes similar transcriptome size

• RPKM/FPKM = reads/fragments per kilobase per million– [Mortazavi et al. 08] Fractionally allocates multireads based on unique read

estimates

– Length for multi-isoform genes?

– Comparing FPKM between samples assumes similar (weighted) transcriptome size

• TPM: transcripts per million– Still relative measure of expression, but comparable between samples

– Most accurate estimation methods use multireads and isoform level estimation

• UMI counts– Absolute measure of expression?

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 31: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

A B C D E

A C

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Gene ambiguous reads

Isoform ambiguous reads

Page 32: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Expectation-maximization approach (IsoEM, RSEM)

irw ,

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

A B C

A C

i

j

Series1

Fa(i)

Series1

Fa (j)

Page 33: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

EM Algorithm

1. Start with random transcript frequencies

0.2

0.2

0.2

0.2

0.2

2. Fractionally allocate reads to transcripts

1

1

1

0.50.5

0.50.5

0.5

0.5

3. Compute expected #reads for each transcript

0.5

2.5

0.5

1

1.5

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 34: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

1. Start with random transcript frequencies

2. Fractionally allocate reads to transcripts

3. Compute expected #reads for each transcript

0.5

2.5

0.5

1

1.5

4. Update transcript frequencies using maximum likelihood estimates

0.5/6

2.5/6

0.5/6

1/6

1.5/6

EM AlgorithmReads QC Read mapping Quantification Cells QC Normalization Clustering Differential

expressionFunctional analysis

Page 35: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

1. Start with random transcript frequencies

2. Fractionally allocate reads to transcripts

3. Compute expected #reads for each transcript

4. Update transcript frequencies using maximum likelihood estimates

0.5/6

2.5/6

0.5/6

1/6

1.5/6

5. Repeat steps 2-4 until convergence

EM AlgorithmReads QC Read mapping Quantification Cells QC Normalization Clustering Differential

expressionFunctional analysis

Page 36: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Detected genes/cell -- main population

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Detected genes/cell -- minor population

Detected genes/cell -- bi-modal distribution

Page 37: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Batch effects can be larger than biological effects, but can be corrected by normalization procedures

CPM & TPM datasets pre-quantile normalization

CPM & TPM datasets post-quantile normalization

Page 38: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Quantile normalization (Irizarry et al 2002) • Shifts CPM/FPKM/TPM values for each cell to match a reference

distribution (e.g., distribution of means)- Highest value gets matched to highest value in reference- 2nd highest gets mapped to 2nd highest value in reference- And so on

Distribution of TPMs

Reference distribution

Page 39: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Principal Component Analysis

• Linear transformation of the data:

– 1st component = direction of max. variance– 2nd component = orthogonal on 1st, max. residual variance

• Used for dimensionality reduction (ignore high components)– Visualization for exploratory analysis– Feature selection

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 40: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

What makes a good clustering?• Homogeneity: Elements within a cluster are close to

each other• Separation: Elements in different clusters are further

apart from each other

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Bad clustering Good clustering

Page 41: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Algorithm ParametersK-means K = Number of clusters

Fuzzy c-means Clustering (FCM)

K = number of clustersd = Degree of fuzziness

Hierarchical Clustering (HCS)

Metric = euclidean, seuclidean, cityblock, minkowski, chebychev, cosine, correlation, spearmanMethod = average, centroid, complete, median, single

EM Clustering K = Number of clustersS = Number of initial seedsI = Number of iteration

SNN-Cliq n = Size of the nearest neighbor listr = Density threshold of quasi-cliques m = Threshold on the overlapping rate for merging.

Many clustering algorithms!

Page 42: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

K-Means Clustering

• Goal: find K clusters minimizing the mean squared distance from data points to corresponding cluster centroids

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 43: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

K-Means Clustering

0

1

2

3

4

5

0 1 2 3 4 5

expression in condition 1

expr

essi

on in

con

ditio

n 2

k1

k2

k3

Page 44: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

0

1

2

3

4

5

0 1 2 3 4 5

expression in condition 1

expr

essi

on in

con

ditio

n 2

k1

k2

k3

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

K-Means Clustering

Page 45: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

0

1

2

3

4

5

0 1 2 3 4 5

expression in condition 1

expr

essi

on in

con

ditio

n 2

k1

k2k3

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

K-Means Clustering

Page 46: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

0

1

2

3

4

5

0 1 2 3 4 5

expression in condition 1

expr

essi

on in

con

ditio

n 2

k1

k2 k3

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

K-Means Clustering

Page 47: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Accuracy measuresPurity

U: set of ground truth classes; V: set of the computed clusters; N:total # of objects in dataset

Adjusted Rand Index (AR)

Rand Index (RI) RI= (TP+TN)/(TP+FP+FN+TN)

F1 Score F1 Score= 2×TP/(2×TP+FP+FN)

Mirkin’s index (MI) It counts the number of disagreements in data pairs between two clustering. It is the ratio of the number of disagreeing pairs to the total number of pairs. Lower value of Mirkin’s index indicates better clustering.

Hubert’s index (HI) HI = RI – MI

Corr Maximum weighted Pearson correlation between average expression value of each class at ground truth and computed cluster

Page 48: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Accuracy comparison (Pollen et al. 2014, MiSeq)

Page 49: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Accuracy comparison (Pollen et al. 2014, HiSeq)

Page 50: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Accuracy comparison (Zeisel et al. 2015)

Page 51: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Tests for differential gene expression must take both fold change and statistical significance into account

*

DE

FC = 2 FC = 2 FC = 1.5

*

Page 52: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

• Many reliable DE methods for data with replicates – edgeR [Robinson et al., 2010]– DESeq [Anders et al., 2010]

• When no/few replicates available bootstrapping provides a robust alternative

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 53: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Sensitivity results on Illumina MCF-7 data with varying number of replicates and minimum fold change 1.5

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Page 54: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

Spindle 0.00001

Apoptosis 0.00025

ENRICHMENTTEST

Enrichment Table

Experimental Data

A priori knowledge +existing experimental data

Gene expression table

Gene-setDatabases

Interpretation& Hypotheses

Page 55: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

http://david.abcc.ncifcrf.gov/

Page 56: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Reads QC Read mapping Quantification Cells QC Normalization Clustering Differential expression

Functional analysis

http://www.genemania.org/

Page 57: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Outline

• Intro to RNA-Seq– RNA-Seq applications– Analysis challenges for single cell data

• Typical analysis pipeline for single-cell RNA-Seq– Primary analysis: reads QC, mapping, and quantification– Secondary analysis: cells QC, normalization, clustering,

and differential expression– Tertiary analysis: functional annotation

• Conclusions

Page 58: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Conclusions • The range of single-cell applications continues to

expand, fueled by advances in microfluidics technology and library prep protocols• ATAC-Seq, GT-Seq, Methyl-Seq, …

• Primary analysis is compute intensive• Requires server/cluster/cloud + linux + scripting• Galaxy framework (https://usegalaxy.org/) provides web-

based interface to many tools

• Most secondary/tertiary analyses can be done on PC/Mac using

• R environment (some programming)• Many can be done using web-based tools and user-friendly

apps (we’ll use JMP)

Page 59: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Conclusions• Development of single-cell specific analysis methods

critical for fully realizing the potential of the technology• Allele specific expression• Biomarker selection• Cell type assignment• Lineage reconstruction• Characterization of heterogeneity

• Joint analysis of bulk and single cell data still needed to get unbiased cell type frequencies• Can also identify and characterize cell types missed by

current capture protocols

Page 60: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Single cells or AND computational deconvolution

Page 61: Computational Methods for Analysis of Single Cell RNA-Seq Data Ion Măndoiu Computer Science & Engineering Department University of Connecticut ion@engr.uconn.edu

Acknowledgements

Sahar Al SeesiMarius NicolaeElham Sherafat

Craig Nelson

Adrian CaciulaSerghei Mangul

Yvette Temate TiagueuAlex Zelikovsky

Edward HemphillJames Lindsay