bioinformatics long-term support (wabi) systems biology ... · piano –a platform for gsa (in r)...

Post on 30-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LeifVäremoleif.varemo@scilifelab.se

BioinformaticsLong-termSupport(WABI)SystemsBiologyFacility@Chalmers

Gene-setanalysisanddataintegration

Outline

Will try to be practical, without getting to the detail of code-level

• Gene-setanalysis- Whatandwhy?• Gene-setcollections• MethodsforGSA• Gene-setdirectionality,overlap/interactions,biases• Thingstoconsider

2

Whatisgene-setanalysis(GSA)?

Immuneresponse

Pyruvate

Gene-level data Gene-setdata(results)

PPARG

Gene-setanalysisGO-termsPathwaysChromosomal locationsTranscription factorsHistone modificationsDiseasesetc…

Samples

Genes

Wewillfocusontranscriptomics anddifferentialexpressionanalysisHowever,GSAcaninprinciplebeusedonalltypesofgenome-widedata.

3

Manynamesforgene-setanalysis(GSA)

• Functionalannotation• Pathwayanalysis• Gene-setenrichmentanalysis• GO-termanalysis• Genelistenrichmentanalysis• …

4

Whygene-setanalysis(GSA)?

• Interpretationofgenome-wideresults• Gene-setsare(typically)fewerthanallthegenesandhave

moredescriptivenames• Difficulttomanagealonglistofsignificantgenes• Detectpatternsthatwouldbedifficulttodiscernsimplyby

manuallygoingthroughe.g.thelistofdifferentiallyexpressedgenes

• Integratesexternalinformationintotheanalysis• Lesspronetofalse-positivesonthegene-level• Topgenesmightnotbetheinterestingones,several

coordinatedsmallerchanges

5

Gene-sets

6

Sowhataboutgene-sets?

• Dependsontheresearchquestion• Severaldatabases/resourcesavailableprovidinggene-set

collections(e.g.MSigDB,Enrichr)• Includeddirectlyinsomeanalysistools• GO-termsareprobablyoneofthemostwidelyusedgene-sets

GO-termsPathwaysChromosomal locationsTranscription factorsHistone modificationsDiseasesMetabolitesetc…

7

Gene-setexample:Geneontology(GO)terms

• Hierarchical graph with three categories (or parents):Biological process, Molecular function, Cellular compartment

• Terms get more and more detailed moving down the hierarchy• Genes can belong to multiple GO terms

8

Gene-setexample:Metabolicpathwaysormetabolites

9

Gene-setexample:Transcriptionfactortargets

10

Gene-setexample:Hallmarkgene-sets

“Hallmark gene sets summarize and represent specific well-defined biological states or processes and display coherent expression. These gene sets were generated by a computational methodology based on identifying gene set overlaps and retaining genes that display coordinate expression. The hallmarks reduce noise and redundancy and provide a better delineated biological space for GSEA.”

Liberzon et al. (2015) Cell Systems 1:417-425

http://software.broadinstitute.org/gsea/msigdb/collections.jsp

11

Wheretogetgene-setcollections?

http://amp.pharm.mssm.edu/Enrichr/#statshttp://software.broadinstitute.org/gsea/msigdb/index.jsp

12Parsed info from various databases. Focus on human.

13

• GO annotations for many specieshttp://geneontology.org/page/download-annotations

• clusterProfiler (R/Bioconductor package)http://bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html#go-gene-set-enrichment-analysis

Wheretogetgene-setcollections?

Not working with human data?

doi: 10.1002/ajmg.b.32328

Wheretogetgene-setcollections?

• Soonerorlateryouwillrunintotheproblemofmatchingyourdatatogene-setcollectionsduetotheexistenceofseveralgeneIDtypes

14

Wheretogetgene-setcollections?

http://www.ensembl.org/biomart/martview

One way to map different gene IDs to each other, or to assemble a gene-set collection with the gene IDs used by your data

See also: DAVID https://david.ncifcrf.gov/content.jsp?file=conversion.htmlMygene http://mygene.info/ and http://bioconductor.org/packages/release/bioc/html/mygene.html 15

Gene-setanalysistoolsandmethods

16

ToolsandmethodsforGSA

OmicsTools (several platforms)http://omictools.com/gene-set-analysis-category

Bioconductor (R packages)https://bioconductor.org/packages/release/BiocViews.html#___GeneSetEnrichment

Some examples:• Hypergeometric test / Fisher’s exact test

(a.k.a overrepresentation analysis)• DAVID (browser)• Enrichr (browser)• GSEA (Java, R)• piano (R)

17

Also exists e.g.:• GSA for GWAS, miRNA, …• Network-based• PlantGSEA• GSA controlling for length bias in RNA-seq• …

There are hundreds of tools to choose between…

Overrepresentationanalysis

All genes (universe)

Selected list of genes GO:003478

GO:000237GO:002736

GO:009835

Is this overlap bigger than expected by

random chance?

8 292 19768

Selected Not selectedIn GO-term

Not in GO-term

Hypergeometric test(Fisher’s exact test)

18

Overrepresentationanalysis

https://david.ncifcrf.gov/home.jsphttp://amp.pharm.mssm.edu/Enrichr/

19

Overrepresentationanalysis

• Requiresacutoff(arbitrary)• Omitstheactualvaluesofthegene-levelstatistics• Goodfore.g.overlapofsignificantgenesintwocomparisons• Computationallyfast

Incontrast,gene-setanalysisiscutoff-freeandusesallgene-leveldataandcandetectsmallbutcoordinatechangesthatcollectivelycontributetosomebiologicalprocess.

20

8 2

92 19768

Selected Not selected

In GO-termNot in GO-term

114 4513

A vs ctrl B vs ctrl

GSA:asimpleexample

Samples

Genes

Gene-set 1

Gene-set 2

𝑆" = 𝑚𝑒𝑎𝑛(𝐺")

• S is the gene-set statistic• G are gene-level statistics of the genes in the gene-set

𝑆+ = −0.1

𝑆0 = 6.2

Permute the gene-labels (or sample labels) and redo the calculations over and over again (e.g. 10,000 times)!

𝑝" = fractionof𝑆=>?@AB>Cthatismoreextremethan𝑆"

𝑆=>?@AB>C

-606

21

𝑆" = 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛(𝐺", [remaininggenes])

Gene-levelstatistics

• p-values• t-values,etc• Fold-changes• Ranks• Correlations• Signaltonoiseratio• …

22

GSEA

Mootha et al Nature Genetics, 2003; Subramanian PNAS 2005

23

Piano– aplatformforGSA(inR)

• Reporterfeatures• Parametricanalysisofgene-setenrichment,PAGE• Tailstrength• Wilcoxonrank-sumtest• Gene-setenrichmentanalysis,GSEA(twoimplementations)• Mean• Median• Sum• Maxmean

Consensusresult

Disclaimer: The author of this presentation is the developer of piano 24

Directionality,overlap,interaction,biases…

25

Directionalityofgene-sets

Disclaimer: The author of this presentation is the developer of piano 26

Gene-setoverlapandinteraction

• High number of very overlapping gene-sets (representing a similar biological theme) can bias interpretation and take attention from other biological themes that are represented by fewer gene-sets.

27

Image from Enrichment Map http://dx.doi.org/10.1371/journal.pone.0013984

Gene-overlap network

Gene-setoverlapandinteraction

Examples of gene-set “interactions”

• High number of very overlapping gene-sets (representing a similar biological theme) can bias interpretation and take attention from other biological themes that are represented by fewer gene-sets.

• Can be valuable to take gene-set interaction into account (e.g. www.sysbio.se/kiwi)

28

ConsiderationswhenperformingGSA

• Biasingene-setcollections(populardomains,multifunctionalgenes,…)• Gene-setnamescanbemisleading(revisitthegenes!)• Considerthegene-setsize,i.e.numberofgenes(specificorgeneral)• Positiveandnegativeassociationbetweengenesandgene-setsmakesgene-

levelfold-changestrickytointerpretcorrectly• (Typically)binaryassociationtogene-sets,doesnottakeintoaccount

varyinglevelsofinfluencefromindividualgenesontheprocessthatisrepresentedbythegene-sets

• Remembertorevisitthegene-leveldata!Arethegenessignificant?Aretheycorrectlyassignedtothespecificgene-set?

• Remembertoadjustformultipletesting

Gene-setanalysisisaveryefficientandusefultooltointerpretyourgenome-widedata!JustremembertocriticallyevaluatetheresultsJ

29

top related