bioinformatics long-term support (wabi) systems biology ... · piano –a platform for gsa (in r)...

29
Leif Väremo [email protected] Bioinformatics Long-term Support (WABI) Systems Biology Facility @ Chalmers Gene-set analysis and data integration

Upload: others

Post on 30-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

LeifVä[email protected]

BioinformaticsLong-termSupport(WABI)SystemsBiologyFacility@Chalmers

Gene-setanalysisanddataintegration

Page 2: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Outline

Will try to be practical, without getting to the detail of code-level

• Gene-setanalysis- Whatandwhy?• Gene-setcollections• MethodsforGSA• Gene-setdirectionality,overlap/interactions,biases• Thingstoconsider

2

Page 3: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Whatisgene-setanalysis(GSA)?

Immuneresponse

Pyruvate

Gene-level data Gene-setdata(results)

PPARG

Gene-setanalysisGO-termsPathwaysChromosomal locationsTranscription factorsHistone modificationsDiseasesetc…

Samples

Genes

Wewillfocusontranscriptomics anddifferentialexpressionanalysisHowever,GSAcaninprinciplebeusedonalltypesofgenome-widedata.

3

Page 4: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Manynamesforgene-setanalysis(GSA)

• Functionalannotation• Pathwayanalysis• Gene-setenrichmentanalysis• GO-termanalysis• Genelistenrichmentanalysis• …

4

Page 5: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Whygene-setanalysis(GSA)?

• Interpretationofgenome-wideresults• Gene-setsare(typically)fewerthanallthegenesandhave

moredescriptivenames• Difficulttomanagealonglistofsignificantgenes• Detectpatternsthatwouldbedifficulttodiscernsimplyby

manuallygoingthroughe.g.thelistofdifferentiallyexpressedgenes

• Integratesexternalinformationintotheanalysis• Lesspronetofalse-positivesonthegene-level• Topgenesmightnotbetheinterestingones,several

coordinatedsmallerchanges

5

Page 6: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-sets

6

Page 7: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Sowhataboutgene-sets?

• Dependsontheresearchquestion• Severaldatabases/resourcesavailableprovidinggene-set

collections(e.g.MSigDB,Enrichr)• Includeddirectlyinsomeanalysistools• GO-termsareprobablyoneofthemostwidelyusedgene-sets

GO-termsPathwaysChromosomal locationsTranscription factorsHistone modificationsDiseasesMetabolitesetc…

7

Page 8: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-setexample:Geneontology(GO)terms

• Hierarchical graph with three categories (or parents):Biological process, Molecular function, Cellular compartment

• Terms get more and more detailed moving down the hierarchy• Genes can belong to multiple GO terms

8

Page 9: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-setexample:Metabolicpathwaysormetabolites

9

Page 10: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-setexample:Transcriptionfactortargets

10

Page 11: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-setexample:Hallmarkgene-sets

“Hallmark gene sets summarize and represent specific well-defined biological states or processes and display coherent expression. These gene sets were generated by a computational methodology based on identifying gene set overlaps and retaining genes that display coordinate expression. The hallmarks reduce noise and redundancy and provide a better delineated biological space for GSEA.”

Liberzon et al. (2015) Cell Systems 1:417-425

http://software.broadinstitute.org/gsea/msigdb/collections.jsp

11

Page 12: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Wheretogetgene-setcollections?

http://amp.pharm.mssm.edu/Enrichr/#statshttp://software.broadinstitute.org/gsea/msigdb/index.jsp

12Parsed info from various databases. Focus on human.

Page 13: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

13

• GO annotations for many specieshttp://geneontology.org/page/download-annotations

• clusterProfiler (R/Bioconductor package)http://bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html#go-gene-set-enrichment-analysis

Wheretogetgene-setcollections?

Not working with human data?

doi: 10.1002/ajmg.b.32328

Page 14: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Wheretogetgene-setcollections?

• Soonerorlateryouwillrunintotheproblemofmatchingyourdatatogene-setcollectionsduetotheexistenceofseveralgeneIDtypes

14

Page 15: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Wheretogetgene-setcollections?

http://www.ensembl.org/biomart/martview

One way to map different gene IDs to each other, or to assemble a gene-set collection with the gene IDs used by your data

See also: DAVID https://david.ncifcrf.gov/content.jsp?file=conversion.htmlMygene http://mygene.info/ and http://bioconductor.org/packages/release/bioc/html/mygene.html 15

Page 16: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-setanalysistoolsandmethods

16

Page 17: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

ToolsandmethodsforGSA

OmicsTools (several platforms)http://omictools.com/gene-set-analysis-category

Bioconductor (R packages)https://bioconductor.org/packages/release/BiocViews.html#___GeneSetEnrichment

Some examples:• Hypergeometric test / Fisher’s exact test

(a.k.a overrepresentation analysis)• DAVID (browser)• Enrichr (browser)• GSEA (Java, R)• piano (R)

17

Also exists e.g.:• GSA for GWAS, miRNA, …• Network-based• PlantGSEA• GSA controlling for length bias in RNA-seq• …

There are hundreds of tools to choose between…

Page 18: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Overrepresentationanalysis

All genes (universe)

Selected list of genes GO:003478

GO:000237GO:002736

GO:009835

Is this overlap bigger than expected by

random chance?

8 292 19768

Selected Not selectedIn GO-term

Not in GO-term

Hypergeometric test(Fisher’s exact test)

18

Page 19: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Overrepresentationanalysis

https://david.ncifcrf.gov/home.jsphttp://amp.pharm.mssm.edu/Enrichr/

19

Page 20: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Overrepresentationanalysis

• Requiresacutoff(arbitrary)• Omitstheactualvaluesofthegene-levelstatistics• Goodfore.g.overlapofsignificantgenesintwocomparisons• Computationallyfast

Incontrast,gene-setanalysisiscutoff-freeandusesallgene-leveldataandcandetectsmallbutcoordinatechangesthatcollectivelycontributetosomebiologicalprocess.

20

8 2

92 19768

Selected Not selected

In GO-termNot in GO-term

114 4513

A vs ctrl B vs ctrl

Page 21: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

GSA:asimpleexample

Samples

Genes

Gene-set 1

Gene-set 2

𝑆" = 𝑚𝑒𝑎𝑛(𝐺")

• S is the gene-set statistic• G are gene-level statistics of the genes in the gene-set

𝑆+ = −0.1

𝑆0 = 6.2

Permute the gene-labels (or sample labels) and redo the calculations over and over again (e.g. 10,000 times)!

𝑝" = fractionof𝑆=>?@AB>Cthatismoreextremethan𝑆"

𝑆=>?@AB>C

-606

21

𝑆" = 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛(𝐺", [remaininggenes])

Page 22: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-levelstatistics

• p-values• t-values,etc• Fold-changes• Ranks• Correlations• Signaltonoiseratio• …

22

Page 23: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

GSEA

Mootha et al Nature Genetics, 2003; Subramanian PNAS 2005

23

Page 24: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Piano– aplatformforGSA(inR)

• Reporterfeatures• Parametricanalysisofgene-setenrichment,PAGE• Tailstrength• Wilcoxonrank-sumtest• Gene-setenrichmentanalysis,GSEA(twoimplementations)• Mean• Median• Sum• Maxmean

Consensusresult

Disclaimer: The author of this presentation is the developer of piano 24

Page 25: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Directionality,overlap,interaction,biases…

25

Page 26: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Directionalityofgene-sets

Disclaimer: The author of this presentation is the developer of piano 26

Page 27: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-setoverlapandinteraction

• High number of very overlapping gene-sets (representing a similar biological theme) can bias interpretation and take attention from other biological themes that are represented by fewer gene-sets.

27

Image from Enrichment Map http://dx.doi.org/10.1371/journal.pone.0013984

Gene-overlap network

Page 28: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

Gene-setoverlapandinteraction

Examples of gene-set “interactions”

• High number of very overlapping gene-sets (representing a similar biological theme) can bias interpretation and take attention from other biological themes that are represented by fewer gene-sets.

• Can be valuable to take gene-set interaction into account (e.g. www.sysbio.se/kiwi)

28

Page 29: Bioinformatics Long-term Support (WABI) Systems Biology ... · Piano –a platform for GSA (in R) • Reporter features • Parametric analysis of gene-set enrichment, PAGE • Tail

ConsiderationswhenperformingGSA

• Biasingene-setcollections(populardomains,multifunctionalgenes,…)• Gene-setnamescanbemisleading(revisitthegenes!)• Considerthegene-setsize,i.e.numberofgenes(specificorgeneral)• Positiveandnegativeassociationbetweengenesandgene-setsmakesgene-

levelfold-changestrickytointerpretcorrectly• (Typically)binaryassociationtogene-sets,doesnottakeintoaccount

varyinglevelsofinfluencefromindividualgenesontheprocessthatisrepresentedbythegene-sets

• Remembertorevisitthegene-leveldata!Arethegenessignificant?Aretheycorrectlyassignedtothespecificgene-set?

• Remembertoadjustformultipletesting

Gene-setanalysisisaveryefficientandusefultooltointerpretyourgenome-widedata!JustremembertocriticallyevaluatetheresultsJ

29