pathway analysis in other data types · pathway analysis in other data types alison motsinger-reif,...

42
Pathway Analysis in other data types Alison Motsinger-Reif, PhD Associate Professor Bioinforma<cs Research Center Department of Sta<s<cs North Carolina State University

Upload: vudan

Post on 30-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

PathwayAnalysisinotherdatatypes

AlisonMotsinger-Reif,PhDAssociateProfessor

Bioinforma<csResearchCenterDepartmentofSta<s<cs

NorthCarolinaStateUniversity

New“-Omes”•  Genome•  Transcriptome

•  Metabolome

•  Epigenome

•  Proteome•  Phenome,exposome,lipidome,glycome,interactome,spliceome,mechanome,etc...

Goals•  Pathwayanalysisinmetabolomics•  Pathwayanalysisinproteomics•  Issues,concernsinotherdatatypes

–  Methyla<ondata–  aCGH–  Nextgenera<onsequencingtechnologies

•  Manyapproachesgeneralize,buttherearealwaysspecificchallengesindifferentdatatypes

•  XGR–anewtoolforpathwayandnetworkanalysis

Metabolomics•  Whilemanyproteinsinteractwitheach

otherandthenucleicacids,therealmetabolicfunc<onofthecellreliesontheenzyma<cinterconversionofthevarioussmall,lowmolecularweightcompounds(metabolites)

•  Technologyisrapidlyadvancing

•  Thefrequentfinalproductofthemetabolomicspipelineisthegenera<onofalistofmetaboliteswho’sconcentra<onshavebeen(significantly)alteredwhichmustbeinterpretedinordertoderivebiologicalmeaning

àPerfectforpathwayanalysis

2routestoMetabolomics

Dataprocessingandannota<on

•  Preprocessingandthelevelofannota<onisVERYdifferentthaningenomicandtranscriptomicdata

•  Manystepsinoverallexperimentaldesignthatgreatlyinfluenceinterpreta<on

•  Willbreiflycoversomeofthemainissues

Analy<calPla\orm•  LikelyGC/LC-MSorNMRastheyarethemostcommon•  Choiceisnormallybasedmoreonavailableequipment,etc.more

thanexperimentaldesign•  GC-MSisanextremelycommonmetabolomicspla\orm,resul<ng

inahighfrequencyoftoolswhichallowforthedirectinputofGC-MSspectra.–  Popularityisduetoitsrela<velyhighsensi<vity,broadrangeof

detectablemetabolites,existenceofwell-establishediden<fica<onlibrariesandeaseofautoma<on

–  separa<on-coupledMSdatarequiresmuchprocessingandcarefulhandlingtoensuretheinforma<onitcontainsisnotar<factual

Targetedvs.Untargeted

•  Scien<stshavebeenquan<fyingmetabolitelevelsforover50yearsthroughtargetedanalysis…

•  Withnewtechnologies,thefocuscanbeonuntargetedmetabolomics–  Reallyhardtoannotateandinterpret–  Integrated–omicsanalysisbeingusedtohelpannotateandunderstanduntargetedmetabolites

– Analogoustocandidategenevs.genomewidetes<ng

KeyIssuesinMetabolomics•  Allofthemetaboliteswithinasystemcannotbeiden<fiedwithanyoneanaly<cal

methodduetochemicalheterogeneity,whichwillcausedownstreamissuesasallmetabolitesinapathwayhavenotbeenquan<fied

•  Notallmetaboliteshavebeeniden<fiedandcharacterizedandsodonotexistinthestandardslibraries,leadingtolargenumberofunannotatedand/orunknownmetabolitesofinterest

•  Organismspecificmetabolicdatabases/networksonlyexistforthehighestusemodelorganismsmakingcontextualinterpreta<onsdifficultformanyresearchers

•  Interpre<ngthehugedatasetsofmetaboliteconcentra<onsundervariouscondi<onswithbiologicalcontextisaninherentlycomplexproblemrequiringextremelyindepthknowledgeofmetabolism.

•  Theissueofdeterminingwhichmetabolitesareactuallyimportantintheexperimentalsysteminques<on.

MetabolomicDatabases

– –  top-down(genetoproteintometabolite)www.metabolomicssociety.org/database–  boeom-up(chemicalen<tytobiologicalfunc<on)

approaches•  Mostcommonlyusedinbiomedicalapplica<ons:– – www.metabolomicssociety.org/databaseMetaCyc

•  Mostcommonlyusedinbiomedicalapplica<ons:–  KEGG– •  MetaCycSubdatabases

LIGAND,REACTIONPAIRandPATHWAY LIGAND,REACTIONPAIRandPATHWAY

MetabolomicDatabasesarelargest(intermsofnumberoforganismsandmostindepthcomprehensive(i.e.MetaCyccontainslinkedinforma<onfrommetabolitetogene)arelargest(intermsofnumberof

•  Othersthatarerapidlygrowing:– 

Reactome(human)–  KNApSAcK(plants)– ModelSEED(diverse)– 

BiG[40](6modelorganisms)–  canbemoreusefulthanthelargedatabasesifaspecificorganismisdesired[40](6modelorganisms)

–  canbemoreusefulthanthelargedatabasesifaspecificorganismisdesired

MetabolomicDatabases

–  ForMetaCyc

,individual‘Cyc’databaseshavebeengeneratedforanumberoforganisms, Cyc•  somejustcomputa<onally ’databaseshavebeengeneratedfora•  othersextensivelymanuallycuratedsuchas

AraCycforArabidopsis•  Morerecentdevelopmentarethecheminforma<c

databaseslikePubChem cheminforma<c–  provideachemicallyontologicalapproachtocataloguingtheill-definedcategoryof‘smallmolecules’ac<veinbiologicalsystems–  canprovideaddi<onalnon-biologyspecificinforma<onaswell

alterna<veformanngop<onsfordatasets(watchforerrors!)definedcategoryof‘smallmolecules’ac<veinbiologicalsystems–  canprovideaddi<onalnon-biologyspecificinforma<onaswell

alterna<veformanngop<onsfordatasets(watchforerrors!)

Enrichmentanalysis

•  Thesedatabasesareusedtocreate•  Majorityofavailabletoolsdoearlygenera<on

over-representa<onanalysis– Withalltheadvantagesandcaveats!– Formoreuptodateanalysis,willneedtoworktomergedatabases,etc.tocorrectlyusemoreup-to-dateapproachesto-dateapproaches

MetabolomicsAnalysisTools

–  Provideasuiteofu<li<esallowingcomprehensiveanalysisfromrawspectraldatatopathwayanalysis• •  MetaboAnalystMetaboAnalyst•  MeltDB

•  EnrichmentAnalysis–  Onlyworkswithprocesseddata

•  PAPi•  MBRole•  MPEA•  TICL•  IMPaLA

•  MetaboliteMapping–  Connectsmetabolitestogene<c/proteomic,etc.resources

•  MetaMapp•  Masstrix•  Paintomics•  VANTED•  Pathos•  Pathos

–  IfbeginningfromrawGCorLC-MSdataMetaboAnalystusesXCMSforpeakfinng,iden<fica<onetc.

–  Onceatthepeaklist(NMRorMS)stage,variouspreprocessingop<onssuchasdata-filteringandmissingvaluees<ma<oncanbeused.

–  Anumberofnormaliza<on,transforma<onandscalingopera<onscanbeperformed.

PLS-DAandhierarchicallyclusteredheatmaps,amongmanyotherop<ons.

–  Allthesethingscanbedoneinotherprograms,butthisisagreattooltogetstartedifyou’renewtometabolomics!togetstartedifyou’renewtometabolomics!

•  EnrichmentAnalysistoolof•  EnrichmentAnalysistoolofMetaboAnalystMetaboAnalystwasoneoftheearliestimplementa<onsofGSEAformetabolomicsdatasets(MSEA)–  quitebiasedtowardshumanmetabolismunlessyoumake–  quitebiasedtowardshumanmetabolismunlessyoumakecustombackgroundpathways/sets

•  Threeop<onsforinput

Analysis,ORA)–  atwocolumnlistofcompoundsANDabundances(SingleSampleProfiling,SSP)

–  amul<-columntableofcompoundabundancesinclassedsamples(Quan<ta<veEnrichmentAnalysis,QEA).samples(Quan<ta<veEnrichmentAnalysis,QEA).

Metaboanalyst

examinerankedorthresholdcut-offlists•  SSPisaimedatdeterminingwhetheranymetabolitesareabovethenormalrangeforcommonhuman

biofluids

•  QEAisthemostcanonicalandwilldeterminewhichmetabolitesetsareenrichedwithintheprovidedclass

labels,whileprovidingacorrela<onvalueandp-value•  QEAisthemostcanonicalandwilldeterminewhichmetabolitesetsareenrichedwithintheprovidedclasslabels,whileprovidingacorrela<onvalueandp-value

PAPiandscaled)•  PathwayAc<vityProfilingisanR-basedtool

•  Worksontheassump<onsthatthedetec<on(i.e.presenceinthelist)ofmoremetabolitesinapathway

andthatlowerabundancesofthosemetabolitesindicateshigherfluxandthereforehigherpathwayac<vity–  Assump<onmaynotalwaysbetrue–  Ex.TCAcycleintermediatescanhavehighabundanceevenwhenfluxthroughthereac<onsinthispathwayisalsohigh–  Ex.TCAcycleintermediatescanhavehighabundanceevenwhenfluxthroughthereac<onsinthispathwayisalsohigh

PAPi

•  Thesescorescanthenbeusedtocompare

experimentalandcontrolcondi<onsbyperformingANOVAorat-testtocomparetwosampletypes.ac<veinthecell

•  Thesescorescanthenbeusedtocompareexperimentalandcontrolcondi<onsbyperformingANOVAorat-testtocomparetwosampletypes.

MetaMapp

interconversionofchemicallysimilaren<<es,compoundscanbeclusteredsolelybytheirchemicalsimilarity

– Highlybeneficialformetaboliteswithoutreac<onannota<on

•  AlsousesKEGGreactantpairinforma<on– chemicalsimilarity

someobviouslybiologically-relatedmetabolitesmisclusteredsomeobviouslybiologically-relatedmetabolites

MetaMapp

•  Canalsomapmetabolitesbasedontheirmassspectralsimilarity(forunknowns)

•  Canbeusedtomakecustom/novelsetsforpathwayanalysis

SummaryonMetabolomicsPathwayAnalysis

•  Metabolomicsisamaturingarea•  “Easy”implementa<onsoftoolsoqenbehindbestprac<cesinpathwayapproaches

•  Issueswith<medependencies,<ssuedependencies,etc.aremoreexaggeratedinmetabolomics

•  Asthetechnologyismaturing,wearejustgenngtounderstandthebiases,sourcesofvaria<on,etc.– Dataqualitycontrolbestprac<cesareevolving– Willhavemajorimpactonthepathwayanalysis

SpecificIssuesforother-omics

•  Willconsidersomeissuesthatarebothspecifictothe“-ome”andtopar<culartechnologies

•  Proteomics•  Epigenomics•  ArrayCGHdata•  RNAseq•  Nextgenera<onsequencing•  …….

Proteomics

•  Aqergenomicsandtranscriptomics,proteomicsisthenextstepincentraldogma

•  Genomeismoreorlessconstant,buttheproteomediffersfromcelltocellandfrom<meto<me

•  Dis<nctgenesareexpressedindifferentcelltypes,whichmeansthateventhebasicsetofproteinsthatareproducedinacellneedstobeiden<fied

•  Itwasassumedforalong<methatmicroarrayswouldcapturemuchofthisinforma<onàNO!

Proteomicsvs.Transcriptomics•  mRNAlevelsdonotcorrelatewithproteincontent

•  mRNAisnotalwaystranslatedintoprotein

•  TheamountofproteinproducedforagivenamountofmRNAdependsonthegeneitistranscribedfromandonthecurrentphysiologicalstateofthecell

•  Manyproteinsarealsosubjectedtoawidevarietyofchemicalmodifica<onsaqertransla<on–  Affectfunc<on–  Ex:phosphoryla<on,ubiqui<na<on

•  Manytranscriptsgiverisetomorethanoneprotein,throughalterna<vesplicingoralterna<vepost-transla<onalmodifica<ons

Proteomics•  Technologicaladvancesforproteomicshasslowed–  Likemetabolomics,thelackofanyPCR-likeamplifica<onislimited

– Unlikemetabolomicsthathasareasonablesearchspace,therees<matedtobemorethanamilliontranscripts

Proteomics•  Availabletechnologieshavedifferentchallenges– Proteinmicroarraysvs.massspecbasedmethods– Generalconcernswithreproducibilitydampenedini<alexcitement

Proteomics

•  Thehighcomplexityandtechnicalinstabilitymeanthatthelevelofannota<onisoqenquitelow

•  Samechallengesaswithmetabolomics,butmoreexaggeratedgiventhelargeannota<onspace

•  Manyofthesameissues…..

Epigenomics•  “Complete”setofepigene<cmodifica<onsonthegene<cmaterialofacell–  epigene<cmodifica<onsarereversiblemodifica<onsonacell’sDNAorhistonesthataffectgeneexpressionwithoutalteringtheDNAsequence

– DNAmethyla<onandhistonemodifica<onmostcommonlyassayed

•  Rapidlyadvancingtechnologies– Histonemodifica<onassays–  CHIP-CHIPandCHIP-Seq– Methyla<onarrays

Epigenomics•  Recentstudieshavefocusedonissuesrelatedtodifferen<al

numbersofprobesingenes–  Mostmicroarraysweredesignedwiththesamenumber–  Formethyla<ondata,thisisnotthecase,andextremebiascanbe

seen–  Biasresultsinalargenumberoffalseposi<ves

•  Canbecorrectedbyapplyingmethodsthatmodelstherela<onshipbetweenthenumberoffeaturesassociatedwithageneanditsprobabilityofappearingintheforegroundlist–  CpGprobesinthecaseofmicroarrays–  CpGsitesinthecaseofhigh-throughputsequencing–  Chipannota<on

•  Canalsobecorrectedwithcarefulapplica<onofpermuta<onapproaches

NextGenera<onSequencing

•  VariantcallinginNGScandetectsinglenucleo<devariants(SNVs)andSNPs

•  ForSNPs,theexactsamepathwaymethodscanbeusedasdesignedforGWASstudies(assuminggenotypingingenomewide)

•  Forrarevariants,standardapproachesareachallenge–  highlyinflatedfalse-posi<veratesandlowpowerinpathway-basedtestsofassocia<onofrarevariants

–  duetotheirlackofabilitytoaccountforgame<cphasedisequilibrium

–  Newareaofmethodsdevelopment

NextGenera<onSequencing

•  RNA-seqdata– Nottrulyquan<ta<ve– Withexperience,knowthatthereareverydifferentvariancedistribu<onsatdifferentlevelsofexpression

– Willmaeerformethodsthattestfordifferencesinvarianceaswellasmean•  TwosidedK-Stests….

XGR•  FangH,KnezevicB,BurnhamKL,KnightJC.XGRsoqwareforenhancedinterpreta<onofgenomicsummarydata,illustrated

byapplica<ontoimmunologicaltraits.GenomeMed.2016Dec13;8(1):129.

•  Schema<cworkflowofXGR:achievingenhancedinterpreta<onofgenomicsummarydata.ThisflowchartillustratesthebasicconceptsbehindXGR.Theuserprovidesaninputlistofeithergenes,SNPs,orgenomicregions,alongwiththeirsignificancelevels(collec<velyreferredtoasgenomicsummarydata).XGR,availableasbothanRpackageandaweb-app,isthenabletorunenrichment,network,similarity,andannota<onanalysesbasedonthisinput.Theanalysesthemselvesarerunusingacombina<onofontologies,genenetworks,gene/SNPannota<ons,andgenomicannota<ondata(built-indata).Theoutputcomesinvariousforms,includingbarplots,directedacyclicgraphs(DAG),circosplots,andnetworkrela<onships.Furthermore,theweb-appversionprovidesinterac<vetables,downloadablefiles,andothervisuals(e.g.heatmaps)

XGRFunc<ons

OtherFunc<onali<es

•  Cross-condi<oncompara<veenrichmentanalysis

•  SNPsimilarityanalysisbasedondiseasetraitprofiles– eQTLs

•  Epigene<cannota<on/enrichment

SummaryonIntegratedAnalysis

•  Technologyadvancesacrossthe“omics”isanexci<ngopportunityforbeeerunderstandingcomplexity

•  Technologieshaveuniqueproper<esthatneedtobeunderstoodandaccountedforinanalysis

•  Metabolomicsresourcesarerapidlymaturing

SummaryonIntegratedAnalysis

•  Databasedevelopment,cura<on,edi<ng,etc.alwayslagsbehindtechnology

•  Issueswithincompleteandinaccurateannota<onaccumulateasmore“omes”areconsidered

•  Withmorecomplexdata,thiscomplexityisnotreadilycapturedinthedatabasesthegenesetanalysisrelieson– Differencesincelltypes,exposure,<me,etc.– Majorneedsformethodsdevelopment…..

Ques<ons?

[email protected]