pathway analysis in other data types · pathway analysis in other data types alison motsinger-reif,...
TRANSCRIPT
PathwayAnalysisinotherdatatypes
AlisonMotsinger-Reif,PhDAssociateProfessor
Bioinforma<csResearchCenterDepartmentofSta<s<cs
NorthCarolinaStateUniversity
New“-Omes”• Genome• Transcriptome
• Metabolome
• Epigenome
• Proteome• Phenome,exposome,lipidome,glycome,interactome,spliceome,mechanome,etc...
Goals• Pathwayanalysisinmetabolomics• Pathwayanalysisinproteomics• Issues,concernsinotherdatatypes
– Methyla<ondata– aCGH– Nextgenera<onsequencingtechnologies
• Manyapproachesgeneralize,buttherearealwaysspecificchallengesindifferentdatatypes
• XGR–anewtoolforpathwayandnetworkanalysis
Metabolomics• Whilemanyproteinsinteractwitheach
otherandthenucleicacids,therealmetabolicfunc<onofthecellreliesontheenzyma<cinterconversionofthevarioussmall,lowmolecularweightcompounds(metabolites)
• Technologyisrapidlyadvancing
• Thefrequentfinalproductofthemetabolomicspipelineisthegenera<onofalistofmetaboliteswho’sconcentra<onshavebeen(significantly)alteredwhichmustbeinterpretedinordertoderivebiologicalmeaning
àPerfectforpathwayanalysis
Dataprocessingandannota<on
• Preprocessingandthelevelofannota<onisVERYdifferentthaningenomicandtranscriptomicdata
• Manystepsinoverallexperimentaldesignthatgreatlyinfluenceinterpreta<on
• Willbreiflycoversomeofthemainissues
Analy<calPla\orm• LikelyGC/LC-MSorNMRastheyarethemostcommon• Choiceisnormallybasedmoreonavailableequipment,etc.more
thanexperimentaldesign• GC-MSisanextremelycommonmetabolomicspla\orm,resul<ng
inahighfrequencyoftoolswhichallowforthedirectinputofGC-MSspectra.– Popularityisduetoitsrela<velyhighsensi<vity,broadrangeof
detectablemetabolites,existenceofwell-establishediden<fica<onlibrariesandeaseofautoma<on
– separa<on-coupledMSdatarequiresmuchprocessingandcarefulhandlingtoensuretheinforma<onitcontainsisnotar<factual
Targetedvs.Untargeted
• Scien<stshavebeenquan<fyingmetabolitelevelsforover50yearsthroughtargetedanalysis…
• Withnewtechnologies,thefocuscanbeonuntargetedmetabolomics– Reallyhardtoannotateandinterpret– Integrated–omicsanalysisbeingusedtohelpannotateandunderstanduntargetedmetabolites
– Analogoustocandidategenevs.genomewidetes<ng
KeyIssuesinMetabolomics• Allofthemetaboliteswithinasystemcannotbeiden<fiedwithanyoneanaly<cal
methodduetochemicalheterogeneity,whichwillcausedownstreamissuesasallmetabolitesinapathwayhavenotbeenquan<fied
• Notallmetaboliteshavebeeniden<fiedandcharacterizedandsodonotexistinthestandardslibraries,leadingtolargenumberofunannotatedand/orunknownmetabolitesofinterest
• Organismspecificmetabolicdatabases/networksonlyexistforthehighestusemodelorganismsmakingcontextualinterpreta<onsdifficultformanyresearchers
• Interpre<ngthehugedatasetsofmetaboliteconcentra<onsundervariouscondi<onswithbiologicalcontextisaninherentlycomplexproblemrequiringextremelyindepthknowledgeofmetabolism.
• Theissueofdeterminingwhichmetabolitesareactuallyimportantintheexperimentalsysteminques<on.
MetabolomicDatabases
– – top-down(genetoproteintometabolite)www.metabolomicssociety.org/database– boeom-up(chemicalen<tytobiologicalfunc<on)
approaches• Mostcommonlyusedinbiomedicalapplica<ons:– – www.metabolomicssociety.org/databaseMetaCyc
• Mostcommonlyusedinbiomedicalapplica<ons:– KEGG– • MetaCycSubdatabases
LIGAND,REACTIONPAIRandPATHWAY LIGAND,REACTIONPAIRandPATHWAY
MetabolomicDatabasesarelargest(intermsofnumberoforganismsandmostindepthcomprehensive(i.e.MetaCyccontainslinkedinforma<onfrommetabolitetogene)arelargest(intermsofnumberof
• Othersthatarerapidlygrowing:–
Reactome(human)– KNApSAcK(plants)– ModelSEED(diverse)–
BiG[40](6modelorganisms)– canbemoreusefulthanthelargedatabasesifaspecificorganismisdesired[40](6modelorganisms)
– canbemoreusefulthanthelargedatabasesifaspecificorganismisdesired
MetabolomicDatabases
– ForMetaCyc
,individual‘Cyc’databaseshavebeengeneratedforanumberoforganisms, Cyc• somejustcomputa<onally ’databaseshavebeengeneratedfora• othersextensivelymanuallycuratedsuchas
AraCycforArabidopsis• Morerecentdevelopmentarethecheminforma<c
databaseslikePubChem cheminforma<c– provideachemicallyontologicalapproachtocataloguingtheill-definedcategoryof‘smallmolecules’ac<veinbiologicalsystems– canprovideaddi<onalnon-biologyspecificinforma<onaswell
alterna<veformanngop<onsfordatasets(watchforerrors!)definedcategoryof‘smallmolecules’ac<veinbiologicalsystems– canprovideaddi<onalnon-biologyspecificinforma<onaswell
alterna<veformanngop<onsfordatasets(watchforerrors!)
Enrichmentanalysis
• Thesedatabasesareusedtocreate• Majorityofavailabletoolsdoearlygenera<on
over-representa<onanalysis– Withalltheadvantagesandcaveats!– Formoreuptodateanalysis,willneedtoworktomergedatabases,etc.tocorrectlyusemoreup-to-dateapproachesto-dateapproaches
MetabolomicsAnalysisTools
– Provideasuiteofu<li<esallowingcomprehensiveanalysisfromrawspectraldatatopathwayanalysis• • MetaboAnalystMetaboAnalyst• MeltDB
• EnrichmentAnalysis– Onlyworkswithprocesseddata
• PAPi• MBRole• MPEA• TICL• IMPaLA
• MetaboliteMapping– Connectsmetabolitestogene<c/proteomic,etc.resources
• MetaMapp• Masstrix• Paintomics• VANTED• Pathos• Pathos
– IfbeginningfromrawGCorLC-MSdataMetaboAnalystusesXCMSforpeakfinng,iden<fica<onetc.
– Onceatthepeaklist(NMRorMS)stage,variouspreprocessingop<onssuchasdata-filteringandmissingvaluees<ma<oncanbeused.
– Anumberofnormaliza<on,transforma<onandscalingopera<onscanbeperformed.
PLS-DAandhierarchicallyclusteredheatmaps,amongmanyotherop<ons.
– Allthesethingscanbedoneinotherprograms,butthisisagreattooltogetstartedifyou’renewtometabolomics!togetstartedifyou’renewtometabolomics!
• EnrichmentAnalysistoolof• EnrichmentAnalysistoolofMetaboAnalystMetaboAnalystwasoneoftheearliestimplementa<onsofGSEAformetabolomicsdatasets(MSEA)– quitebiasedtowardshumanmetabolismunlessyoumake– quitebiasedtowardshumanmetabolismunlessyoumakecustombackgroundpathways/sets
• Threeop<onsforinput
Analysis,ORA)– atwocolumnlistofcompoundsANDabundances(SingleSampleProfiling,SSP)
– amul<-columntableofcompoundabundancesinclassedsamples(Quan<ta<veEnrichmentAnalysis,QEA).samples(Quan<ta<veEnrichmentAnalysis,QEA).
Metaboanalyst
examinerankedorthresholdcut-offlists• SSPisaimedatdeterminingwhetheranymetabolitesareabovethenormalrangeforcommonhuman
biofluids
• QEAisthemostcanonicalandwilldeterminewhichmetabolitesetsareenrichedwithintheprovidedclass
labels,whileprovidingacorrela<onvalueandp-value• QEAisthemostcanonicalandwilldeterminewhichmetabolitesetsareenrichedwithintheprovidedclasslabels,whileprovidingacorrela<onvalueandp-value
PAPiandscaled)• PathwayAc<vityProfilingisanR-basedtool
• Worksontheassump<onsthatthedetec<on(i.e.presenceinthelist)ofmoremetabolitesinapathway
andthatlowerabundancesofthosemetabolitesindicateshigherfluxandthereforehigherpathwayac<vity– Assump<onmaynotalwaysbetrue– Ex.TCAcycleintermediatescanhavehighabundanceevenwhenfluxthroughthereac<onsinthispathwayisalsohigh– Ex.TCAcycleintermediatescanhavehighabundanceevenwhenfluxthroughthereac<onsinthispathwayisalsohigh
PAPi
• Thesescorescanthenbeusedtocompare
experimentalandcontrolcondi<onsbyperformingANOVAorat-testtocomparetwosampletypes.ac<veinthecell
• Thesescorescanthenbeusedtocompareexperimentalandcontrolcondi<onsbyperformingANOVAorat-testtocomparetwosampletypes.
MetaMapp
interconversionofchemicallysimilaren<<es,compoundscanbeclusteredsolelybytheirchemicalsimilarity
– Highlybeneficialformetaboliteswithoutreac<onannota<on
• AlsousesKEGGreactantpairinforma<on– chemicalsimilarity
someobviouslybiologically-relatedmetabolitesmisclusteredsomeobviouslybiologically-relatedmetabolites
MetaMapp
• Canalsomapmetabolitesbasedontheirmassspectralsimilarity(forunknowns)
• Canbeusedtomakecustom/novelsetsforpathwayanalysis
SummaryonMetabolomicsPathwayAnalysis
• Metabolomicsisamaturingarea• “Easy”implementa<onsoftoolsoqenbehindbestprac<cesinpathwayapproaches
• Issueswith<medependencies,<ssuedependencies,etc.aremoreexaggeratedinmetabolomics
• Asthetechnologyismaturing,wearejustgenngtounderstandthebiases,sourcesofvaria<on,etc.– Dataqualitycontrolbestprac<cesareevolving– Willhavemajorimpactonthepathwayanalysis
SpecificIssuesforother-omics
• Willconsidersomeissuesthatarebothspecifictothe“-ome”andtopar<culartechnologies
• Proteomics• Epigenomics• ArrayCGHdata• RNAseq• Nextgenera<onsequencing• …….
Proteomics
• Aqergenomicsandtranscriptomics,proteomicsisthenextstepincentraldogma
• Genomeismoreorlessconstant,buttheproteomediffersfromcelltocellandfrom<meto<me
• Dis<nctgenesareexpressedindifferentcelltypes,whichmeansthateventhebasicsetofproteinsthatareproducedinacellneedstobeiden<fied
• Itwasassumedforalong<methatmicroarrayswouldcapturemuchofthisinforma<onàNO!
Proteomicsvs.Transcriptomics• mRNAlevelsdonotcorrelatewithproteincontent
• mRNAisnotalwaystranslatedintoprotein
• TheamountofproteinproducedforagivenamountofmRNAdependsonthegeneitistranscribedfromandonthecurrentphysiologicalstateofthecell
• Manyproteinsarealsosubjectedtoawidevarietyofchemicalmodifica<onsaqertransla<on– Affectfunc<on– Ex:phosphoryla<on,ubiqui<na<on
• Manytranscriptsgiverisetomorethanoneprotein,throughalterna<vesplicingoralterna<vepost-transla<onalmodifica<ons
Proteomics• Technologicaladvancesforproteomicshasslowed– Likemetabolomics,thelackofanyPCR-likeamplifica<onislimited
– Unlikemetabolomicsthathasareasonablesearchspace,therees<matedtobemorethanamilliontranscripts
Proteomics• Availabletechnologieshavedifferentchallenges– Proteinmicroarraysvs.massspecbasedmethods– Generalconcernswithreproducibilitydampenedini<alexcitement
Proteomics
• Thehighcomplexityandtechnicalinstabilitymeanthatthelevelofannota<onisoqenquitelow
• Samechallengesaswithmetabolomics,butmoreexaggeratedgiventhelargeannota<onspace
• Manyofthesameissues…..
Epigenomics• “Complete”setofepigene<cmodifica<onsonthegene<cmaterialofacell– epigene<cmodifica<onsarereversiblemodifica<onsonacell’sDNAorhistonesthataffectgeneexpressionwithoutalteringtheDNAsequence
– DNAmethyla<onandhistonemodifica<onmostcommonlyassayed
• Rapidlyadvancingtechnologies– Histonemodifica<onassays– CHIP-CHIPandCHIP-Seq– Methyla<onarrays
Epigenomics• Recentstudieshavefocusedonissuesrelatedtodifferen<al
numbersofprobesingenes– Mostmicroarraysweredesignedwiththesamenumber– Formethyla<ondata,thisisnotthecase,andextremebiascanbe
seen– Biasresultsinalargenumberoffalseposi<ves
• Canbecorrectedbyapplyingmethodsthatmodelstherela<onshipbetweenthenumberoffeaturesassociatedwithageneanditsprobabilityofappearingintheforegroundlist– CpGprobesinthecaseofmicroarrays– CpGsitesinthecaseofhigh-throughputsequencing– Chipannota<on
• Canalsobecorrectedwithcarefulapplica<onofpermuta<onapproaches
NextGenera<onSequencing
• VariantcallinginNGScandetectsinglenucleo<devariants(SNVs)andSNPs
• ForSNPs,theexactsamepathwaymethodscanbeusedasdesignedforGWASstudies(assuminggenotypingingenomewide)
• Forrarevariants,standardapproachesareachallenge– highlyinflatedfalse-posi<veratesandlowpowerinpathway-basedtestsofassocia<onofrarevariants
– duetotheirlackofabilitytoaccountforgame<cphasedisequilibrium
– Newareaofmethodsdevelopment
NextGenera<onSequencing
• RNA-seqdata– Nottrulyquan<ta<ve– Withexperience,knowthatthereareverydifferentvariancedistribu<onsatdifferentlevelsofexpression
– Willmaeerformethodsthattestfordifferencesinvarianceaswellasmean• TwosidedK-Stests….
XGR• FangH,KnezevicB,BurnhamKL,KnightJC.XGRsoqwareforenhancedinterpreta<onofgenomicsummarydata,illustrated
byapplica<ontoimmunologicaltraits.GenomeMed.2016Dec13;8(1):129.
• Schema<cworkflowofXGR:achievingenhancedinterpreta<onofgenomicsummarydata.ThisflowchartillustratesthebasicconceptsbehindXGR.Theuserprovidesaninputlistofeithergenes,SNPs,orgenomicregions,alongwiththeirsignificancelevels(collec<velyreferredtoasgenomicsummarydata).XGR,availableasbothanRpackageandaweb-app,isthenabletorunenrichment,network,similarity,andannota<onanalysesbasedonthisinput.Theanalysesthemselvesarerunusingacombina<onofontologies,genenetworks,gene/SNPannota<ons,andgenomicannota<ondata(built-indata).Theoutputcomesinvariousforms,includingbarplots,directedacyclicgraphs(DAG),circosplots,andnetworkrela<onships.Furthermore,theweb-appversionprovidesinterac<vetables,downloadablefiles,andothervisuals(e.g.heatmaps)
OtherFunc<onali<es
• Cross-condi<oncompara<veenrichmentanalysis
• SNPsimilarityanalysisbasedondiseasetraitprofiles– eQTLs
• Epigene<cannota<on/enrichment
SummaryonIntegratedAnalysis
• Technologyadvancesacrossthe“omics”isanexci<ngopportunityforbeeerunderstandingcomplexity
• Technologieshaveuniqueproper<esthatneedtobeunderstoodandaccountedforinanalysis
• Metabolomicsresourcesarerapidlymaturing
SummaryonIntegratedAnalysis
• Databasedevelopment,cura<on,edi<ng,etc.alwayslagsbehindtechnology
• Issueswithincompleteandinaccurateannota<onaccumulateasmore“omes”areconsidered
• Withmorecomplexdata,thiscomplexityisnotreadilycapturedinthedatabasesthegenesetanalysisrelieson– Differencesincelltypes,exposure,<me,etc.– Majorneedsformethodsdevelopment…..