scrnaseqnormalization and gene set selection

45
scRNAseq normalization and gene set selection Åsa Björklund [email protected]

Upload: others

Post on 16-Jul-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: scRNAseqnormalization and gene set selection

scRNAseq normalizationandgenesetselection

Åsa Bjö[email protected]

Page 2: scRNAseqnormalization and gene set selection

Outline

• Introduction• Normalization• Genesetselection• Removalofconfounders

Page 3: scRNAseqnormalization and gene set selection

Biologicalandtechnicalvariation

• Biologicalvariation:– Celltype/state– Cellcycle– Cellsize– Sex,Age,…– Etc..

• Technicalvariation– Cellquality– Libraryprepefficiency– Batcheffects– Etc…

Page 4: scRNAseqnormalization and gene set selection

Biologicalandtechnicalvariation

• Biologicalvariation:– Celltype/state– Cellcycle– Cellsize– Sex,Age,…

– Etc..

• Technicalvariation– Cellquality– Libraryprepefficiency– Batcheffects– Etc..

Toidentifycelltypeswewouldliketoremoveallothersourcesofvariation.

Page 5: scRNAseqnormalization and gene set selection

UMIsdoesnotsolvetheproblem

Vallejos etal.NatureMethods2017

Page 6: scRNAseqnormalization and gene set selection

Normalization

• Countnormalization –forunevensequencingdepth• Genelengthnormalization– fordifferencesingenedetectionduetogenelength

• Drop-outratenormalization– fordifferencesinRNAcontent/drop-outrates

Page 7: scRNAseqnormalization and gene set selection

BulkRNAseq methods• CPM:Controlsforsequencingdepthwhendividingbytotalcount• RPKM/FPKM:Controlsforsequencingdepthandgenelength.Goodfor

technicalreplicates,notgoodforsample-sampleduetocompositionalbias.AssumestotalRNAoutputissameinallsamples.

• TPM:SimilartoRPKM/FPKM.Correctsforsequencingdepthandgenelength.Alsocomparablebetweensamplesbutnocorrectionforcompositionalbias.

Xi:observedcountli:lengthofthetranscriptNnumberoffragmentssequenced

Page 8: scRNAseqnormalization and gene set selection

BulkRNAseq methods

• TMM/RLE/MRN:Improvedassumption:Theoutputbetweensamplesforacoresetonlyofgenesissimilar.Correctsforcompositionalbias.RLEandMRNareverysimilarandcorrelateswellwithsequencingdepth. edgeR::calcNormFactors() implementsTMM,TMMwzp,RLE&UQ. DESeq2::estimateSizeFactors implementsmedianratiomethod(RLE).Doesnotcorrectforgenelength.

• VST/RLOG/VOOM:Varianceisstabilised acrosstherangeofmeanvalues.Foruseinexploratoryanalyses. vst() and rlog() functionsfrom DESeq2. voom() functionfrom Limma convertsdatatonormaldistribution.

Page 9: scRNAseqnormalization and gene set selection

scRNAseq normalization

• Deconvolution/Scran (Pooling-Across-Cells)• SCnorm (Expression-DepthRelation)• SCTransform• Census• Linnorm• ZINB-WaVE• BASiCS• More…

Page 10: scRNAseqnormalization and gene set selection

Logtransformation

• Log-transformedvaluesapproachesnormaldistributionforbulkRNAseq data

• ForscRNAseq – moresimilartozero-inflatedbinomial

• Whilenon-transformeddataishardtofit.

Page 11: scRNAseqnormalization and gene set selection

Depthnormalizationandlogtransformation

• Themostsimplenormalizationistodividebysequencingdepth*ascalefactorandlog-transformthedata

• Scater normalize – usestotalcountsorsizefactors.Defaultisreturn_log =TRUE.

• SeuratNormalizeData – returnslog-normalizeddatawithscale.factor =10Kbydefault.

• Scanpy normalize_per_cell/normalize_total –normalizebysequencingdepth– thenneedtorunlog1p.

Page 12: scRNAseqnormalization and gene set selection

Depthnormalization

• AssumingsameRNAcontentinallcells– mayworkwellinhomogeneouscellpopulation

• InmostcasestheamountofRNA– andofUMIs/readsdifferbetweencells.

• Alsoimportanttocheckforoulier genesthatconstitutelargeproportionofthereads!

Page 13: scRNAseqnormalization and gene set selection

Deconvolution

Lun etal.GenomeBiol.2016

Page 14: scRNAseqnormalization and gene set selection

Scran - computeSumFactors

• Deconvolutionwithallcells– Theassumptionisthatmostgenesarenotdifferentiallyexpressed(DE)betweencells,

• Deconvolutionwithinclusters(FastClusterbeforehand)– Sizefactorscomputedwithineachclusterandrescaledbynormalizationbetweenclusters.

– WhenmanygenesareDEbetweenclustersinaheterogeneouspopulation.

• computeSumFactors – willalsoremovelowabundancegenes

Page 15: scRNAseqnormalization and gene set selection

Normalizationwithgenegroups

• Globalscalefactorsmayleadtoovercorrectionforweaklyandmoderatelyexpressedgenesandundernormalization forhighlyexpressedgenes.

• Solution:Donormalizationforgenesatdifferentexpressionlevels.

Page 16: scRNAseqnormalization and gene set selection

SCNorm:Expressionvs.DepthBiasCorrection

Bacher etal.NatureMethods2017)

Quantileregressiontoestimatethecount–depthrelationship

Page 17: scRNAseqnormalization and gene set selection

SCNorm:Expressionvs.DepthBiasCorrection

IdenticalcellsintwogroupsshouldresultinnoDEandFC=1ifnormalizationwasefficient

Bacher etal.NatureMethods2017)

Page 18: scRNAseqnormalization and gene set selection

SCTransform (Seurat)

Hafmeister &Satija GenomeBiology2019

Page 19: scRNAseqnormalization and gene set selection

SCTransform (Seurat)

Pearsonresidualsfromregularizednegativebinomial(NB)regression

Hafmeister &Satija GenomeBiology2019

Page 20: scRNAseqnormalization and gene set selection

SCTransform (Seurat)

• OBS!SCTransform functioninSeuratalsodoesvariablegeneselction inthesamestepwithaslightlydifferentmethodthanthedefaultinSeurat.

• Butyoucanalsospecifywhichgenestoruniton.

• Youcanalsorunregressioninthesamestep.

Page 21: scRNAseqnormalization and gene set selection

Zero-InflatedNegativeBinomial-basedWantedVariationExtraction(ZINB-WaVE).

• Bothgene-levelandsample-levelcovariates• ExtensionoftheRUVmodel

Risso etal.Nat.Comm.2018

Page 22: scRNAseqnormalization and gene set selection

ZINB-WaVE

ReducestechnicalinfluenceonPCA,alsobatcheffect.

Page 23: scRNAseqnormalization and gene set selection

Sizefactorswithdifferentnormalizations

Vieth etal.NatureComm.2019

Page 24: scRNAseqnormalization and gene set selection

DEwithdifferentnormalizations

Vieth etal.NatureComm.2019

Page 25: scRNAseqnormalization and gene set selection

Imputation

• scRNAseq hasalotofzerosinexpressionmatrix• CommonforGWASdatatoimputeSNPs• Manymethodsrecentlypublished:– SAVER– DrImpute– scImpute– MAGiC– Knn-smooth– Deepcountautoencoder

Page 26: scRNAseqnormalization and gene set selection

Imputationcanintroducefalsecorrelations

Andrewsetal.F1000research2018

Page 27: scRNAseqnormalization and gene set selection

ImputationhaslittleeffectonDEdetection

Vieth etal.NatureComm.2019

Page 28: scRNAseqnormalization and gene set selection

Normalization+imputationcomparison

TianNatureMethods2019

Page 29: scRNAseqnormalization and gene set selection

Scalingdata– Z-scoretransformation

• Z-scoretransformation- linearly transform data toameanofzeroandastandarddeviationof1.

• PCAoranyothertypeofanalysiswillbedominatedbyhighlyexpressedgeneswithhighvariance.

• ItcanbewisetocenterandscaleeachgenebeforeperformingPCA

Page 30: scRNAseqnormalization and gene set selection

Whatnormalizationshouldyouuse?

• Normalizationhasbigimpactondifferentialgeneexpression,butnotasmuchonclustering

• Inmostcasesitisenoughtodosequencedepthnormalization

• Whenworkingwithhighlysimilarsubtypesofthesamecelltype,orwithcelltypes ofverydifferentsizes,individualsizefactorscouldhelp.

• Binningbygenelevel(SCTransform)helpstoremovetheeffectofdifferentgenedetectionacrosscells.

Page 31: scRNAseqnormalization and gene set selection

Selectinggenes

• Excludinginvariablegenesthatdonotcontributeinformative/interestinginformation– Improvedsignaltonoiseratio– Reducedcomputationalrequirements

• Highlyvariablegenes(HVGs)• Correlatedgenepairs/groups• TopPCAloadings

Page 32: scRNAseqnormalization and gene set selection

Variablegeneselection

• Geneswhichbehavedifferentlyfromanullmodeldescribingtechnicalnoise– Mean-variancetrend:geneswithhigherthanexpectedvariance

– Coefficientofvariation(Brennecke etal.2013)

• Highdropoutgenes– Numberofzerosunexpectedlyhighcomparedtonullmodel

Page 33: scRNAseqnormalization and gene set selection

Highlyvariablegenes(HVGs)

(Brennecke etal.NatureMethods2013)

Fitagammageneralizedlinearmodel

NoERCCs?->estimatetechnicalnoisebasedonallgenes

Page 34: scRNAseqnormalization and gene set selection

HVGswithspike-incontrols– normalizationmatters

Page 35: scRNAseqnormalization and gene set selection

M3Drop

• ReversetranscriptionisanenzymereactionthuscanbemodelledusingtheMichaelis-Menten equation:

S:averageexpressionKM:Michaelis-Menten constant

Page 36: scRNAseqnormalization and gene set selection

Confoundingfactors

• Anysourceofvariationthatyoudonotexpecttogiveseparationofthecelltypes.– Cellcycle– Cellsize– Sequencingdepth– Cellquality– Batch– More…

Page 37: scRNAseqnormalization and gene set selection

Linearregression

• Fitalinetothegeneexpressionvsvariableofinterest

• Calculateresiduals• Removevarianceexplainedbythevariableofinterestbytakingtheresiduals.

• Multiplelinearregressionifmultiplefactors.

Page 38: scRNAseqnormalization and gene set selection

Othertoolstoremoveunwantedvariance

• RUVseq()orsvaseq()• Linearmodelswithe.g.removeBatchEffect()inlimma orscater

• ComBat()insva

Page 39: scRNAseqnormalization and gene set selection

Whatconfoundersshouldyouremove?

• Percentmitochondrialreads– oftencorrelateswithqualityofcell

• Sequencingdepth• Genedetectionrate– relatestoamountofRNApercell.

• Cellcycle• Batcheffects(Sample,sortdate,sex,etc.)ALWAYS checkQCparametersafteranalysisandseehowtheyinfluenceyourdata.BUT, becarefulthatyourconfoundersarenotrelatedtoyourbiologicalquestion!

Page 40: scRNAseqnormalization and gene set selection

Scalingandregressioninpractice

• SeuratScaleData:doesZ-scoretransformationandregressionofvariablesinvars.to.regress. Canuselinear(default),poisson ornegbiommodels.

• Scran: runsscalingbutnotcenteringautomaticallyinPCAstep.trendVar functionestimatesunwantedvariationeitherwithadesignmatrixorwithblockfactors.decomposeVar ordenoisePCA toremoveunwantedvariation.

• Scanpy:pp.regress_out andpp.scale functions.

Page 41: scRNAseqnormalization and gene set selection

Cellcycleeffect

Buettner etal.NatureBiotech.2019

Page 42: scRNAseqnormalization and gene set selection

Predictcellcyclestage/scores

• Seurat– CellCycleScoring – buildsonG2M- &S-phasehumangenelistsfromTirosh etal.paper

• Scran – cyclone function– trainedonmousecellcyclesortedcells.Usesrelativeexpressionofpairsofgenes.

• Scanpy - tl.score_genes_cell_cycle – usessamegenelistasSeurat

Page 43: scRNAseqnormalization and gene set selection

Cellcycleremoval

• Regressiononcellcyclescores.• scLVM (betapre-release)- Designedforcell-cyclevariationcorrection.Alsocorrectionofotherconfoundingvariables.

• ccRemover (stableversionfromCRAN).“ccRemoveroutperformsscLVM slightly.”

• Oscope• reCAT

Page 44: scRNAseqnormalization and gene set selection

Conclusions

• Normalizationhasbigimpactondifferentialgeneexpression.

• Manydifferentmethodstoremoveunwantedvariance– oftenanimportantstep!

• Selectionofvariablegenesisimportanttoremovenoiseinthedata.AlwayssubsetgenesbeforerunningPCA/clustering.

• Alwaysaimforsamesequencingdepthinallsamples– toavoidatleastoneconfoundingfactor.

Page 45: scRNAseqnormalization and gene set selection

Donotworry!

Ifyouhavedistinctcelltypes – theclusteringwillbethesameregardlessofhowyoutreatthedata.

But,forsubclustering ofsimilarcelltypes normalizationandremovalofconfoundersmaybecrucial.