rna-seq: quantification and models for assessing ... · quantification and models for assessing...
TRANSCRIPT
![Page 1: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/1.jpg)
RNA-seq:quantificationandmodelsfor
assessingdifferentialexpression(atleastforsomeapproaches)
IanDworkinNGS2016
@IanDworkin
![Page 2: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/2.jpg)
Whatwewillcovertoday• Absolutefundamentalsofexperimentaldesign• Whyweusecountdataasinput• IntroducingabitofprobabilitytowhymanyRNADifferentialanalysistoolsuseanegativebinomial.
• Whydocareaboutvariance/over-dispersionsomuch.• Howdoweestimateover-dispersionwithsmallsamplesizes(andwhyedgeR andDGEgivedifferentresults).
• Abitaboutdealingwithmultiplecomparisons(ifwehavetime).
![Page 3: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/3.jpg)
GoalsIamnotplanningontryingtoprovideanysortofoverviewofstatisticalmethodsforgenomicdata.InsteadIamgoingtoprovideafewshortideastothinkabout.
Statistics(likebioinformatics)isarapidlydevelopingarea,inparticularwithrespecttogenomics.Rarelyisitclearwhatthe“rightway”toanalyzeyourdatais.
InsteadIhopetoaidyouinusingsomecommonsensewhenthinkingaboutyourexperimentsforusinghighthroughputsequencing.
![Page 4: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/4.jpg)
Caveats
• Therearewholecoursesonproperexperimentaldesignandstatistics.Greatbookstoo.ThismaterialinBio720isnotenough!
• ForexperimentaldesignIhighlyrecommend:– Quinn&Keough:ExperimentalDesignanddataanalysisforbiologists.
http://www.amazon.com/Experimental-Design-Data-Analysis-Biologists/dp/0521009766/
![Page 5: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/5.jpg)
Thebasicsofexperimentaldesign
• Thereareafewbasicpointstoalwayskeepinmind:– Biologicalreplication(asmuchasyoucanafford)isextremelyimportant.Torobustlyidentifydifferentiallyexpressed(DE)genesrequiresstatisticalpowers.• (note:thisisnothowmanyreadsyouhaveforagenewithinasample,buthowmanybiologically/statisticallyindependentsamplespertreatment).
– Technicalreplicationdoesnothelpwithstatisticalpower(i.e.don’tsplitasinglesampleandrunastwolibraries).
![Page 6: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/6.jpg)
Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithin
abiologicalsample!!!!
• Sequencing(andlibraryprep)costsarestillsufficientlyexpensivethatmostexperimentsusesmallnumbersofbiologicalreplicates.
• Giventheadditionalcostsoflibrarycosts(~225$/sampleatourfacility),manyfolksgoforincreaseddepthinsteadofmoresamples.
• Foragivenlevelofsequencingdepth(total)foratreatment,itisfarbettertogoformorebiologicalreplicates,eachatlowersequencingdepth(ratherthanfewerreplicatedathighersequencingdepth).
![Page 7: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/7.jpg)
Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithinabiological
sample!!!!
Roblesetal.2012
![Page 8: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/8.jpg)
Howdothemethodscompareinsimulation?
Kvam etal.2012
![Page 9: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/9.jpg)
Thebasicsofexperimentaldesign
• Thereareafewbasicpointstoalwayskeepinmind:– Biologicalreplication.– Designyourexperimenttoavoidconfoundingyourdifferenttreatments(sex,nutrition)witheachotherorwithtechnicalvariables(lanewithinaflowcell,betweenflowcellvariation).• Makediagrams/tablesofyourexperimentaldesign,orusearandomizeddesign.
![Page 10: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/10.jpg)
Thebasicsofexperimentaldesign
• Thereareafewbasicpointstoalwayskeepinmind:– Biologicalreplication.– Designexperimenttoavoidconfounding variables.– Sampleindividuals(withintreatment)randomly!
![Page 11: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/11.jpg)
Usefulreferences
PaulL.AuerandR.W.Doerge 2010.StatisticalDesignandAnalysisofRNA-SeqData.Genetics.10.1534/genetics.110.114983PMID:20439781
Bullard,J.H.,Purdom,E.,Hansen,K.D.,&Dudoit,S.(2010).EvaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmRNA-SeqexperimentsBMCBioinformatics,11,94.doi:10.1186/1471-2105-11-94
![Page 12: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/12.jpg)
Designingyourexperimentbeforeyoustart.
Sampling
Replication
Blocking
Randomization
OverallwearegoingtobethinkingabouthowtoavoidConfoundingsourcesofvariationinthedata.
AllofthesearelargertopicsthatarepartofExperimentalDesign.
![Page 13: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/13.jpg)
Sampling
Sampling
Replication
Blocking
Randomization
Samplingdesignisallaboutmakingsurethatwhenyou“pick”(sample)observations,youdosoinarandom andunbiasedmanner.
Propersamplingaimstocontrolforunknownsourcesofvariationthatinfluencetheoutcomeofyourexperiments.
Thisseemsreasonable,andoftenintuitivetomostexperimentalbiologists,butitcanbeveryinsidious.Whiteboard…
![Page 14: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/14.jpg)
Sampling
Sampling
Replication
Blocking
Randomization
![Page 15: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/15.jpg)
BiologicalreplicatesNottechnicalones.
• Thereislittlepurposeinusingtechnicalreplication(i.e.samesample,multiplelibrarypreps)fromagivenbiologicalsampleUNLESSpartofyourquestionrevolvesaroundit.
• Focusonbiologicalvariability.Whileyouareconfoundingsomesourcesoftechnicalandbiologicalvariability,wealreadyknowalotabouttheformer,andlittleaboutthelatter(inparticularforyoursystem).
![Page 16: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/16.jpg)
Replication
Sampling
Replication
Blocking
Randomization
Imagineyouhaveanexperimentwithonefactor(sex),withtwotreatmentlevels(malesandfemales).
Youwanttolookforsexspecificdifferencesinthebrainsofyourcrittersbasedontranscriptionalprofiling,soyoudecidetouseRNA-seq.
Perhapsyouhavealimitedbudgetsoyoudecidetorunonesampleofmalebrains,andonesampleoffemalebrains,eachinonelaneofaflowcell.
What(useful)informationcanyougetoutofthis?
Notmuch(buttheremaybesome).Why?
![Page 17: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/17.jpg)
Replication
Sampling
Replication
Blocking
Randomization
Why?
Noreplication.Howwillyouknowifthedifferencesyouobserveareduetodifferencesinmalesandfemales,random(biological)differencesbetweenindividuals,ortechnicalvariationduetoRNAextraction,processingorrunningthesamplesondifferentlanes.
Allofthesesourcesofvariationareconfounded,andtherearenoparticularlygoodwaysofseparatingthemout.
Buttherearelotsofsourcesofvariation,sohowdoweaccountforthese?
![Page 18: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/18.jpg)
Replication
Sampling
Replication
Blocking
Randomization
Todate,severalstudieshavesuggestedthat“technical”replicatesforRNA-seq showverylittlevariation/highcorrelation.
Mortazavi etal.2008
Howmightsuchastatementbemisleadingaboutvariation?
![Page 19: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/19.jpg)
Replication
Sampling
Replication
Blocking
Randomization
Thisstudylookedatasinglesourceoftechnicalvariation.
Runningexactlythesamesampleontwodifferentlanesonaflowcell.
Thiscompletelyignoresothersourcesof“technicalvariation”variationduetoRNApurificationvariationduetofragmentation,labeling,etc..lanetolanevariationflowcelltoflowcellvariation
Allofthesemaybeimportant(althoughunlikelyinteresting)sourcesofvariation…
However…..
![Page 20: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/20.jpg)
Replication
Sampling
Replication
Blocking
Randomization
ManystudieshaveignoredtheBIOLOGICALSOURCESofVARIATIONbetweenreplicates.Inmostcasesbiologicalvariationbetweensamples(fromthesametreatment)aregenerallyfarmorevariablethantechnicalsourcesofvariation.
Whileitwouldbenicetobeabletopartitionvarioussourcesoftechnicalvariation(suchaslabeling,RNAextraction),itoftentooexpensivetoperformsuchadesign(seewhiteboard).
IFyouhavelimitedresources,itisgenerallyfarbettertohavebiologicalreplication(independentbiologicalsamplesforagiventreatment)thantechnicalreplication.
Doestheseleadtoconfoundedsourcesofvariation?
![Page 21: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/21.jpg)
Blocking
Sampling
Replication
Blocking
Randomization
Blocksinexperimentaldesignrepresentsomefactor(usuallysomethingnotofmajorinterest)thatcanstronglyinfluenceyouroutcomes.Moreimportantlyitisafactorwhichyoucanusetogroupotherfactorsthatyouareinterestedin.
Forinstanceinagriculturethereisoftenplottoplotvariation.Youmaynotbeinterestedintheplotthemselvesbutinthevarietyofcropsyouaregrowing.
Butwhatwouldhappenifyougrewallofstrain1onplot1andallofstrain2onplot2?
Whiteboard.
Theseplotswouldrepresentblockinglevels
![Page 22: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/22.jpg)
Blocking
Sampling
Replication
Blocking
Randomization
Ingenomicstudiesthemajorblockinglevelsareoftentheslide/chipformicroarrays(i.e.twosamples/slidefor2colorarrays,16arrays/slideforIllumina arrays).
ForGAII/HiSeq RNA-seq datathemajorblockingeffectistheflowcellitselfandlaneswithintheflowcell.
AuerandDoerge 2010
![Page 23: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/23.jpg)
Blocking
Sampling
Replication
Blocking
Randomization
Incorporatinglanesasablockingeffect
AuerandDoerge 2010
![Page 24: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/24.jpg)
Blockingdesigns
Sampling
Replication
Blocking
Randomization
BalancedIncompleteBlockingDesign(BIBD)
Let’sdissectthesesubscripts.
Balancedfortreatmentsacrossflowcells..Randomizedforlocation AuerandDoerge 2010
![Page 25: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/25.jpg)
Whatstandardtechnicalissuesshouldyouconsiderforblocking:
• FlowCell• Lane• Adaptors• Libraryprep• Sameinstrument• People!• RNAextraction/purification
![Page 26: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/26.jpg)
Whathappenswhenyoufailtoblock(orreplicate)?
![Page 27: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/27.jpg)
Yue F,ChengY,Breschi A,etal.:AcomparativeencyclopediaofDNAelementsinthemousegenome.Nature.2014;515(7527):355–364
LinS,LinY,Nery JR,etal.:Comparisonofthetranscriptionallandscapesbetweenhumanandmousetissues.ProcNatl Acad Sci USA.2014;111(48):17224–17229
Inarecentanalysisofthemod-encodedata,RNAseq datasuggestedthatclustering(forgeneexpression)morebyspeciesthanbytissue.Thiswasanunusualfinding.
![Page 28: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/28.jpg)
Gilad YandMizrahi-ManO.AreanalysisofmouseENCODEcomparativegeneexpressiondata[v1;refstatus:indexed,http://f1000r.es/5ez]F1000Research2015,4:121(doi:10.12688/f1000research.6536.1)
Anewre-analysisdemonstratedsomepotentiallyseriousissueswiththeexperimentaldesign
![Page 29: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/29.jpg)
Figure1.Studydesign for:Yue F,ChengY,Breschi A,etal.:AcomparativeencyclopediaofDNAelementsinthemousegenome.Nature.
2014;515(7527):355–364LinS,LinY,Nery JR,etal.:Comparisonofthetranscriptionallandscapesbetweenhumanandmousetissues.
ProcNatl Acad Sci USA.2014;111(48):17224–17229
GiladYandMizrahi-ManO2015[v1;refstatus:awaitingpeerreview,http://f1000r.es/5ez]F1000Research2015,4:121(doi:10.12688/f1000research.6536.1)
![Page 30: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/30.jpg)
Differentialexpression
• ProbablythesinglemostcommonuseofRNA-Seq dataisexaminedifferentialexpressionoftranscripts(transcriptionalprofiles).
![Page 31: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/31.jpg)
Differentialexpression
• Butdifferentialexpressionofwhat?
![Page 32: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/32.jpg)
Differentialexpression
• Butdifferentialexpressionofwhat?– Genes– Transcripts(alternativetranscripts)– Allelespecificexpression– Exon levelexpression
![Page 33: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/33.jpg)
Yourprimarygoalsofyourexperimentshouldguideyourdesign.
• Theexactdetails(#biologicalsamples,sampledepth,read_length,strandspecificity)ofhowyouperformyourexperimentneedstobeguidedbyyourprimarygoal.
• Unlessyouhaveallthe$$,nosingledesigncancaptureallofthevariability.
![Page 34: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/34.jpg)
Yourgoalsmatter
• Forinstance:Ifyourprimaryinterestindiscoveryofnewtranscripts,samplingdeeplywithinasampleisprobablybest.
• Fordifferentialexpressionanalyses,youwillalmostneverhavetheabilitytoperformDifferentialexpressionanalysisonveryraretranscripts,soitisrarelyusefultogeneratemorethan15-20millionreadpairsperbiologicalsample.
![Page 35: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/35.jpg)
Asimpletruth:Thereisnotechnologynorstatistical
wizardrythatcansaveapoorlyplannedexperiment.Theonlytrulyfailedexperimentisapoorlyplanned
one.
Toconsultthestatisticianafteranexperimentisfinishedisoftenmerelytoaskhim(her)toconductapostmortemexamination.He(she)canperhapssaywhattheexperimentdiedof.
RonaldFisher
![Page 36: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/36.jpg)
![Page 37: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/37.jpg)
Counting
• Oneofthemostdifficultissueshasbeenhowtocount.
• Wefirstneedtoaskwhatfeatures wewanttocount.
![Page 38: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/38.jpg)
WhatFeaturescouldwecount?
![Page 39: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/39.jpg)
WhatFeaturescouldwecount?
• Countingatthelevelofgenes(readsmappedtogeneregardlessoftranscript).
• Countingattheleveloftranscript.• Countingatthelevelofexons.• Countingatthelevelofkmers withinoneoftheabove
• Countingatthelevelofnucleotideswithinexon/transcript/gene.
![Page 40: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/40.jpg)
Counting
• Weareinterestedintranscriptabundance.• Butweneedtotakeintoaccountanumberofthings.
![Page 41: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/41.jpg)
Counting
• Weareinterestedintranscriptabundance.• Butweneedtotakeintoaccountanumberofthings.
• Howmanyreadsinthesample.• Lengthoftranscripts• GCcontentandsequencingbias(influencingcountsoftranscriptswithinasample).
![Page 42: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/42.jpg)
SeeminglysensibleCounting(butultimatelynotsouseful).
• RPKM(readsalignedperkilobase ofexon permillionreadsmapped)– Mortazavi etal2008
• FPKM(fragmentsperkilobase ofexon permillionfragmentsmapped).Sameideaforpairedendsequencing.
• TPM,TMM…etc…
![Page 43: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/43.jpg)
Takehomemessage(fromme):Actualcountsshouldbeusedasinputfordifferentialexpressionanalysis,not
(pre)scaled measures.
![Page 44: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/44.jpg)
BUT:Noteveryoneagreeswiththisapproachthough.Norwithmyargumentsaboutcounting.
Lior Patcher’s blogisagoodplacetowatchthedebate.Alsocheckoutsomecommentsinthevignetteandpaperonlimma/voom.
![Page 45: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/45.jpg)
RPKM
![Page 46: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/46.jpg)
ProblemswithRPKM
• RPKMisnotaconsistentmeasureofexpressionabundance(orrelativemolarconcentration).
• See– http://blog.nextgenetics.net/?e=51– Wagneretal2012MeasurementofmRNAabundanceusingRNA-seq data:RPKMmeasureis
inconsistentamongsamples.TheoryBiosci
![Page 47: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/47.jpg)
HowaboutTranscriptspermillion(TPM)
WhileTPMisingeneralmore(statistically)consistent,itisstillgenerallynotappropriate.
![Page 48: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/48.jpg)
Normalization(forDE)canbemuchmorecomplicatedinpractice
• Whymightscalingbytotalnumberofreads(sequencingdepth)beamisleadingquantitytoscaleby?
![Page 49: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/49.jpg)
Normalization(forDE)canbemuchmorecomplicatedinpractice
• Scalingbytotalmappedreads(sequencingdepth)canbesubstantiallyinfluencedbythesmallproportionofhighlyexpressedgenes.
(Whatmighthappen?)
• Anumberofalternativeshavebeenproposedandused(i.e.usingquantile normalization,etc..)
Bullard,J.H.,Purdom,E.,Hansen,K.D.,&Dudoit,S.(2010).EvaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmRNA-Seq experiments.BMCBioinformatics,11,94.doi:10.1186/1471-2105-11-94
![Page 50: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/50.jpg)
Counting(andnormalizing)inpractice
• Inpractice,wedonotwantto“pre-scale”ourdataasisdoneinF/R-PKMorTPM.
• Insteadwearefarbetteroffusingamodelbasedapproachfornormalizingforread-lengthorlibrarysizeinthedatamodelingperse.
• Thisisfarmoreflexible.
![Page 51: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/51.jpg)
Takehomemessage:Actualcountsshouldbeusedasinputfordifferentialexpressionanalysis,not
(pre)scaled measures.
Theissueisthatgettingunambiguouscountsishard(Rob).
![Page 52: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/52.jpg)
DifferentialExpressionanalysis.APrimer.
• Iamassumingthatwehavealreadydecidedonanappropriatemethodtocountandconvertmappedreadstodiscretevalues…
• Thereisabitweneedtoknowtohelpusunderstandwhattodonext.
![Page 53: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/53.jpg)
Abitofbackgroundonprobability.• Fundamentallyourobservedmeasureofexpressionarethecountsofreads.
• Dependinguponthedatamodelingframeworkwewishtouse,weneedtoaccountforthis,asthesearenotnecessarilyapproximatedwellbynormal(Gaussian)distributionsthatareusedfor“standard”linearmodelsliket-tests,ANOVA,regression.
• Thisisnotaproblematall,asitiseasytomodeldatacomingfromotherdistributions,andiswidelyavailableinstatspackagesandprogramminglanguagesalike.
![Page 54: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/54.jpg)
ProbabilityDensityvs.Massfunction
ProbabilityMassfunctionforadiscretevariable.
ProbabilityDensityfunctionforacontinuousvariable.
![Page 55: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/55.jpg)
ProbabilityMassfunction(Fordiscretedistributions,likeread
counts)
P(13|Poisson(l=10))=0.073
Heightrepresentstheprobabilityatthatpoint(integer).
“Area”oftheboxhasnoparticularmeaning.
P(integer)≥0P(non-integers)=0.
![Page 56: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/56.jpg)
ProbabilityDensityfunction
Heightatx=13is0.0799Thisisnottheprobabilityatx=13,butthedensity.i.e.f(13)=0.0799,wheref(x)isthenormaldistribution.
P(x=13|N(mean=10,sd=3.3))=0WHY?
![Page 57: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/57.jpg)
ProbabilityDensityfunction
Wecandefinetheprobabilityintheinterval10≤x≤15
P(10≤x≤15|N(10,3.3))=0.435
![Page 58: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/58.jpg)
Clarificationsoncontinuousdistributions.
AREAUNDERCURVEOFPDF=1
(Theintegralofthenormal)
![Page 59: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/59.jpg)
Bolker 2007CH4page137
![Page 60: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/60.jpg)
Themultitudeofprobabilitydistributionsallowustotochoose
thosethatmatchourdataortheoreticalexpectationsintermsof
shape,location,scale.
![Page 61: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/61.jpg)
Fittingadistributionisanartandscienceofutmostimportanceinprobabilitymodeling.Theideaisyouwantadistributiontofityourdatamodel“justright”withoutafitthatis“overfit”(orunderfit).Overfittingmodelsissometimesaprobleminmoderndataminingmethodsbecausethemodelsfitcanbetoospecifictoaparticulardatasettobeofbroaderuse.
Seefeld2007
![Page 62: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/62.jpg)
Sowhydoweusethem?It’sallaboutshapeandscale!
• Becausetheyprovideausableframeworkforframingourquestions,andallowingforparametricmethods;i.elikelihoodandBayesian.
• Evenifwedonotknowitsactualdistribution,itisclearfrequencydataisgenerallygoingtobebetterfitbyabinomialthananormaldistribution.Why?
![Page 63: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/63.jpg)
Whywillitbeabetterfit?
• Thebinomialisbounded byzeroand1• Otherdistributions(gamma,poisson,etc)havealowerboundaryatzero.
• Thisprovidesaconvenientframeworkfortherelationshipbetweenmeansandvarianceasoneapproachestheboundarycondition.
![Page 64: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/64.jpg)
Somediscretedistributions(leadinguptowhywemaywantto
usethenegativebinomial)
BinomialPoisson
Negative-binomial
![Page 65: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/65.jpg)
Randomvariables
• Thisiswhatwewanttoknowtheprobabilitydistributionof.
• I.e.P(x|somedistribution)
Iwilluse“x”tobetherandomvariableineachcase.
![Page 66: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/66.jpg)
BinomialLet’ssayyousetupaseriesofenclosures.Withineachenclosureyouplace25flies,andapre-determinedsetofpredators.Youwanttoknowwhatthedistribution(acrossenclosures)offliesgettingeatenis,basedonapre-determinedprobabilityofsuccessforagivenpredatorspecies.
Youcansetthisupasabinomialproblem.
N(Rcallsthissize)=25(thetotal#ofindividualsor“trials”forpredation)intheenclosurep=probabilityofasuccessfulpredation“trial”(thecointoss)x=#trialsofsuccessfulpredation.Thisiswhatweusuallywantfortheprobabilitydistribution.
![Page 67: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/67.jpg)
Binomial
Youcanthinkofthisintwoways.A)Anormalizingconstantsothatprobabilitiessumto1.B)#ofdifferentcombinationstoallowforx“successful”predationeventsoutofNtotal.
Youwilloftenseex=kandhear“Nchoosek”
![Page 68: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/68.jpg)
Example
• Ifpredatorspecies1hadaper“trial”probabilityofsuccessfullyeatingapreyitemof0.2,whatwouldbetheprobabilityofexactly10flies(outofthe25)beingeateninasingleenclosure.
P(x=10|bi(N=25,p=0.2))=0.0118
Notsohigh.Wecanlookattheexpectedprobabilitydistributionfordifferentvaluesofx.
![Page 69: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/69.jpg)
Thiswouldbetheexpecteddistributionifwesetupmanyreplicateenclosureswith25fliesandthispredator.
![Page 70: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/70.jpg)
Predatorspecies2ismuchhungrier….
![Page 71: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/71.jpg)
Let’ssaywehad100fliesperenclosure,andpredatorspecies3was
reallyineffective,p=0.01
Whiletheremaybeatheoreticallimittothenumberoffliesthatcanbeeaten,practicallyspeakingitisunlimitedsincethepredationprobabilityissolow.
ThisisalotlikethesituationwehavewithRNA-seq data.
![Page 72: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/72.jpg)
Poisson• Whenyouhaveadiscreterandomvariablewheretheprobabilityofa“successful”trialisverysmall,butthetheoretical(orpractical)rangeiseffectivelyinfinite,youcanuseapoisson distribution.
• Usefulforcounting#of“rare”events,likenewmigrantstoapopulation/year.
• #ofnewmutations/offspring..• #countsofsequencingreads(wellsortof)…
![Page 73: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/73.jpg)
Poisson• It isalsoseeminingly usefulforRNA-Seqdata.(althoughwewillseenotveryusefulinpractice).
![Page 74: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/74.jpg)
Poisson
x isourrandomvariable(#events/unitsamplingeffort)– readcountsforageneinasamplel Isthe“rate”parameter. i.e.Expectednumber ofreads(foratranscript)persamplel isthemeanandthevariance!!!!
ForitsrelationtoabinomialwhenNislargeandp issmalll=N*p
![Page 75: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/75.jpg)
Poisson
• Let’ssayfliesdispersetocolonizeanewpatchataverylowrate(previousestimatessuggestwewillobserveoneflyforeverytwonewpatchesweexamine,l=0.5).
• Whatistheprobabilityofobserving2fliesonanewpatchofland?
P(x=2|poisson(l=0.5))=0.076
![Page 76: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/76.jpg)
Probabilityofobservingxnumberoffliesonapatchgivenlambda=0.5
![Page 77: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/77.jpg)
Whathappensaslambdaincreases?
0 1 2 3 4 5 6 7 8 9 11 13
! = 4 (expected # of reads for transcript x across samples)
# of reads for transcript x
prop
ortio
n of
sam
ples
for t
rans
crip
t x
0.00
0.05
0.10
0.15
4 7 10 14 18 22 26 30 34 38
! = 20
# of reads for transcript x
prop
ortio
n of
sam
ples
for t
rans
crip
t x
0.00
0.02
0.04
0.06
0.08
58 68 76 84 92 101 111 121 131 141
! = 100
# of reads for transcript x
prop
ortio
n of
sam
ples
for t
rans
crip
t x
0.00
0.01
0.02
0.03
0.04
![Page 78: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/78.jpg)
Poissonmeanandvariance
• Whenlambdaissmallforyourrandomvariable,youwilloftenfindthatyourdatais“over-dispersed”.
• ThatisthereismorevariationthatexpectedunderPoisson(lambda).
• Similarlywhenlambdagetslarge,youwilloftenfindthatthereislessvariationthanexpectedunderPoisson(lambda).
![Page 79: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/79.jpg)
AndersandHuber2010GenomeBiology
![Page 80: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/80.jpg)
Whypoisson mightnotmodelsequencereadswell
• MostRNA-Seq data(andmostcountdatainbiology)isnotmodeledwellbypoissonbecausetherelationshipsbetweenmeansandvariancestendtobefarmorecomplicatedamong(andwithin)biologicalreplicates.
• Ithasbeenargued(Mortzavi etal2008)thattechnicalvariationinRNA-Seq iscapturedbyPoisson.Ihavemydoubtsevenonthis.
![Page 81: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/81.jpg)
Quasi-poisson
• Sinceover-dispersionissuchacommonissue,anumberofapproacheshavebeendevelopedtoaccountforitwithcountdata.
• Oneistouseaquasi-poisson.• Insteadofvariance(x)=λ,itis
• Variance(x)=λθ• Whereθ isthe(multiplicative)over-dispersionparameter.
![Page 82: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/82.jpg)
Howaboutanormaldistribution?
• Despiteworkingwithdiscretecountdata,severalauthorsusenormaldistributions.Severalreasons.
![Page 83: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/83.jpg)
Howaboutanormaldistribution?• Despiteworkingwithdiscretecountdata,severalauthorsuse
normaldistributions.Severalreasons:
1. Whenthemeannumberofcountsisfarenoughawayfromzero,oftenthenormaldistributiondoesagoodjoboffittingthedata(andcapturingmean&variancerelationship).Forlowmeancountsavariancestabilizationcanaidmodeling(theapproachusedinlimma/voom).
2. Ourresponsevariable(countsoffeatures)arenotmeasuredwithouterror,andthereforearenottruemeasures.Whenestimatingeffectsinourmodelweaccountforthisuncertaintyandassuminganormaldistributionenablesadditionalflexibility.
![Page 84: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/84.jpg)
Negativebinomial
• InbiologytheNeg.Binomialismostlyusedlikeapoisson,butwhenyouneedmoredispersionofx (itneedstobespreadoutmore).
• Thenegativebinomial isaPoissondistributionwherelambdaitselfvariesaccordingtoaGammadistribution.
![Page 85: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/85.jpg)
Negativebinomial
Expectednumberofcounts=μOver-dispersionparameter=k
Forourpurposesallwecareaboutisthat
![Page 86: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/86.jpg)
General(ized)linearmodels
• Forresponsevariablesthatarecontinuous,youarelikelyfamiliarwithapproachesthatcomefromthegenerallinearmodel.
Astandardlinearregression(ifx iscontinuous).Ifx isdiscretethiswouldbeat-test/Anova.
![Page 87: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/87.jpg)
Generalizedlinearmodel
• MANYofthedifferentialexpressiontoolsutilizealinearmodelframework.
• Thusitisimportanttogetfamiliarwiththeframework.
• TheclassbyJonathanandBen(B)isprobablyagreatplacetostart.
![Page 88: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/88.jpg)
ContinuityofStatisticalApproaches
t-test
ANOVA
NumberofLevels:
MixedEffectsModel(randomorboth)FixedPredictors:
Regression(continuous)
ANCOVA(both)
GeneralLinearModel
Predictors:(discrete)
GeneralizedLinearModel(non-normal)Response:
(normal)
ProcessModels
![Page 89: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/89.jpg)
Generalizedlinearmodels• Butwhatdoyoudowhenyourresponsevariableisnotnormallydistributed?
• Theframeworkofthelinearmodelcanbeextendedtoaccountfordifferentdistributionsfairlyeasily(onemajorclassoftheseisthegeneralizedlinearmodels).
![Page 90: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/90.jpg)
ContinuityofStatisticalApproaches
t-test
ANOVA
NumberofLevels:
MixedEffectsModel(randomorboth)FixedPredictors:
Regression(continuous)
ANCOVA(both)
GeneralLinearModel
Predictors:(discrete)
GeneralizedLinearModel(non-normal)Response:
(normal)
ProcessModels
![Page 91: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/91.jpg)
Generalized LinearModels(GLiM)
• Inmanycasesagenerallinearmodel isnotappropriatebecausevaluesarebounded– e.g.counts>0,proportionsbetween0and1
• Ageneralizationoflinearmodelstoincludeanydistributionoferrorsfromtheexponentialfamilyofdistributions
• Normal,Poisson,binomial,multinomial,exponential,gamma,NOTnegativebinomial
• GeneralLinearModelisjustaspecialcaseofGLiMinwhichtheerrorsarenormallydistributed
• Example,logisticregression• Wewilluselikelihoodforparameterestimationandinference
![Page 92: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/92.jpg)
GeneralizationsofGLM
• Insteadofasimplelinearmodel:Y=b0 +b1x1+b2x2 +e
– Assumethate’sareindependent,normallydistributedwithmean0andconstantvariances2
– Cansolveforb’sbyminimizingsquarede’s
• GLiMconsiderssomeadjustmenttothedatatolinearizeY- alink function
Y=g(b0 +b1x1+b2x2 +e)or f(Y)=b0 +b1x1+b2x2 +e– Forexampleforcountdatawhicharealwayspositive
f(Y)=log(Y) loglink
![Page 93: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/93.jpg)
Whatisalinkfunction?
• Thelinkfunctionisawayoftransformingtheobservedresponsevariable(LHS).
• Goals• 1)linearizeobservedresponse• 2)Altertheboundaryconditionsofthedata.• 3)Toallowforanadditivemodelinthecovariates(RHS)
![Page 94: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/94.jpg)
PoissonFamily
• Dataarecountsofsomething(i.e.0,1,2,3,4…)• Numberofoccurrencesofaneventoverafixedperiodoftimeorspace• Examples…
• Ifthemeanvalueishighthencountscanbelog-normalornormallydistributed• Whenmeanvalueislowthentherestartstobelotsofzerosandvariancedependson
themean• Ifupperendisalsoboundedthenbinomialwouldbebetter
• Defaultlinkisthelog link,variancefunction=µ– i.e.,family=poisson(link=“log”,variance=“mu”)– Otheroptionmightbethesqrt link
![Page 95: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/95.jpg)
PoissonandnegativebinomialFamily
Essentiallyitmeansyoucanlogtransformthesequencecountsanduseapoisson,quasi-poisson ornegativebinomialtofitit(mostlinksaremorecomplicated,thisisniceandsimple).
i.e.countsaremodeledas
![Page 96: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/96.jpg)
Methodsusingnb glm• edgeR (butitisnotdefault,sobeware!)• DESeq/DESeq2(maybeDEXseq aswell?)• BaySeq• Limma (voom – kindofsortof…).
• Howevertheseallmodelthevariancequitedifferently(howtheyborrowinformationacrossgenestoestimatemean-variancerelationships).
SeeYu,Huber&Vitek 2013(Bioinformatics)fordiscussionofthisissue.
![Page 97: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/97.jpg)
Methodsusingpoisson andquasi-poisson
• tspm (twostagepoisson model)– Fitsmodelswithpoisson first.Ifover-dispersedthenusesaquasi-poisson.
– Thusthereareessentiallytwogroupsofgenes.
![Page 98: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/98.jpg)
Whythisisuseful• Sincewecanfittheseasageneralizedlinearmodel,wecanfitarbitrarilycomplexdesigns(ifwehavesufficientsamplesizestoestimatealltheparameters).
• Wecanincorporateallaspectsofreadlength,librarysize,lane,flowcellinadditiontoalloftheimportantbiologicalpredictors(yourtreatments).
• NOt-testsforyou!!!
![Page 99: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/99.jpg)
Estimatingover-dispersion(variance)(orwhyprogramsseeminglydoingthe
samethinggivedifferentresults)
![Page 100: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/100.jpg)
Variancesrequirelotsofdatatoestimatewell(notjustforcountdata)• Itturnsoutthattoestimatevariances,youneedalotmorereplicationthanyoudoformeans.
• HowevermostRNA-Seq experimentsstillhavesmallnumbersofbiologicalreplicates.
• Sohowtogoaboutestimatingvariances?
![Page 101: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/101.jpg)
IFsamplesizesarelarge(withinandbetweentreatments).
• Mostmethodsdowell(basedonNB,quasi-Pornon-parametricapproaches).
• Theycanmodelindividuallevelvariances(andpotentiallycanuseresamplingapproachestoavoidhavingtomakeparametricassumptions).
![Page 102: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/102.jpg)
Butifsamplesizes(intermsofbiologicalreplication)issmall.
• Thenwehaveaproblem.• Thisiswherethesoftwarereallytendstodiffer,astheyallmake(different)assumptionsabouttheuncertaintyincounts,mean-variancerelationships,andhowbesttomodelsucheffects.
• InparticularedgeR andDEseq usesomemethodstoborrowinformationacrossgenes(andhaveoptionstochangethisprocess).
• Thiscandramaticallychangetheresults.Anders,S.,&Huber,W.(2010).Differentialexpressionanalysisforsequencecountdata.GenomeBiology,11(10),R106.doi:10.1186/gb-2010-11-10-r106
Andersetal(2013).Count-baseddifferentialexpressionanalysisofRNAsequencingdatausingRandBioconductor.NatureProtocols,8(9),1765–1786
![Page 103: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/103.jpg)
AndersandHuber2010
![Page 104: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/104.jpg)
Yuetal(2013).ShrinkageestimationofdispersioninNegativeBinomialmodelsforRNA-seq experimentswithsmallsamplesize.Bioinformatics,29(10),1275–1282.
![Page 105: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/105.jpg)
AndersandHuber2010
![Page 106: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/106.jpg)
Let’sthinkaboutthis.
Love,Huber&Anders2014BioRXiV doi:10.1101/002832
![Page 107: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/107.jpg)
Wecanalso“shrink”estimatesbasedonover-dispersion….
![Page 108: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/108.jpg)
Takehome
• Withsmallsamplesizes,themethodsusedifferentapproachestogetgene-wiseover-dispersion(basedonalldata).
• EdgeR ismorepowerful(moresignificanthits)thanDESeq generally.Butmuchmoresusceptibletofalsepositivesduetooutliers.
• DESeq2“should”besomewhereinthemiddle.
![Page 109: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/109.jpg)
![Page 110: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/110.jpg)
Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithin
abiologicalsample!!!!
• Sequencing(andlibraryprep)costsarestillsufficientlyexpensivethatmostexperimentsusesmallnumbersofbiologicalreplicates.
• Giventheadditionalcostsoflibrarycosts(~225$/sampleatourfacility),manyfolksgoforincreaseddepthinsteadofmoresamples.
• Foragivenlevelofsequencingdepth(total)foratreatment,itisfarbettertogoformorebiologicalreplicates,eachatlowersequencingdepth(ratherthanfewerreplicatedathighersequencingdepth).
![Page 111: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/111.jpg)
Biologicalreplicationgivesfarmorestatisticalpowerthanincreasedsequencingdepthwithinabiological
sample!!!!
Roblesetal.2012
![Page 112: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/112.jpg)
Howdothemethodscompareinsimulation?
Kvam etal.2012
![Page 113: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/113.jpg)
Howdothemethodscompareinsimulation?
Kvam etal.2012
![Page 114: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/114.jpg)
Howdothemethodscompareforrealdata?
Kvam etal.2012
![Page 115: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/115.jpg)
Howdothemethodscompareinadifferentsetofsimulations?
Soneson 2012
WillexplainROC(receiveroperatorcurves)andtheareaundercurvesonboard.
![Page 116: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/116.jpg)
References• Robles,J.A.,Qureshi,S.E.,Stephen,S.J.,Wilson,S.R.,Burden,C.J.,&Taylor,J.M.(2012).Efficientexperimentaldesignand
analysisstrategiesforthedetectionofdifferentialexpressionusingRNA-Sequencing.BMCGenomics,13,484.doi:10.1186/1471-2164-13-484
• Bullard,J.H.,Purdom,E.,Hansen,K.D.,&Dudoit,S.(2010).EvaluationofstatisticalmethodsfornormalizationanddifferentialexpressioninmRNA-Seq experiments.BMCBioinformatics,11,94.doi:10.1186/1471-2105-11-94
• Kvam,V.M.,Liu,P.,&Si,Y.(2012).AcomparisonofstatisticalmethodsfordetectingdifferentiallyexpressedgenesfromRNA-seq data.AmericanJournalOfBotany,99(2),248–256.doi:10.3732/ajb.1100340
• Soneson,C.,&Delorenzi,M.(2013).AcomparisonofmethodsfordifferentialexpressionanalysisofRNA-seq data.BMCBioinformatics,14,91.doi:10.1186/1471-2105-14-91
• Wagner,G.P.,Kin,K.,&Lynch,V.J.(2012).MeasurementofmRNAabundanceusingRNA-seq data:RPKMmeasureisinconsistentamongsamples.Theoryinbiosciences=Theorie indenBiowissenschaften,131(4),281–285.doi:10.1007/s12064-012-0162-3
• Vijay,N.,Poelstra,J.W.,Künstner,A.,&Wolf,J.B.W.(2012).Challengesandstrategiesintranscriptomeassemblyanddifferentialgeneexpressionquantification.Acomprehensiveinsilico assessmentofRNA-seq experiments.MolecularEcology.doi:10.1111/mec.12014
![Page 117: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/117.jpg)
Whydowecareaboutmultiplecomparisons?
![Page 118: RNA-seq: quantification and models for assessing ... · quantification and models for assessing differential expression ... – Design your experiment to avoid confounding ... In](https://reader031.vdocuments.us/reader031/viewer/2022022512/5ae9434a7f8b9ab24d8bda62/html5/thumbnails/118.jpg)
Howcanwedealwithmultiplecomparisons