netbiosig2012 chrisevelo

45
1

Upload: alexander-pico

Post on 27-Jan-2015

103 views

Category:

Technology


0 download

DESCRIPTION

Keynote lecture for NetBioSIG 2012

TRANSCRIPT

Page 1: NetBioSIG2012 chrisevelo

1

Page 2: NetBioSIG2012 chrisevelo

In modern systems biology we have three main data domains.

1) Experimental data from genomics types of experiments like in the example,

(bottom right) microarrays. Note that this type requires intensive

precalculations (quality control, filtering, clustering, annotation) but that is

not enough to really understand the data. You see patterns in the data, but

you do not really know what they mean. Large scale genomics data has

been available over the pas 15 years or so, and although technologies

used are now being replaced that doesn’t really change this field.

2) Existing knowledge (see next slide), that can be used to better understand

the two other types of data

3) Genetics (sequence based) data that rapidly becomes more important with

the decrease of sequencing cost. The addition of the leftmost corner to the

triangle is relatively new, and I will only discuss it in the last few slides

2

Page 3: NetBioSIG2012 chrisevelo

Huge amounts of existing knowledge can be found hidden in the literature or in

the heads of people. The hard task is to collect it from there and to make it

available for analysis. (People on the slide are Ben van Ommen - NuGO

director, Hannelore Daniel – nutrigenomics chair from Munich and a Thai

Princess and institute director.

Note that a lot of information is also available in curated databases, but that

was left out of the talk for brevity reasons. You could say that structuring of the

other knowledge is needed to provide these databases that can then be used

for analysis.

3

Page 4: NetBioSIG2012 chrisevelo

An historical example of a microarray result. Again note the intensive

preprocessing done. (clustering to the left, annotation to the right).

Nevertheless the data is very hard to understand. Especially if you take into

account that there are about 20,000 genes on a typical array. About as much

as there are words in a dictionary.

Page 5: NetBioSIG2012 chrisevelo

But if you are willing to make the effort you can actually see meaningful groups

of genes within specific coexpression clusters. Like the fatty acid degradation

genes shown here. But it is hard to find (or easy to miss) all relevant pathways.

Page 6: NetBioSIG2012 chrisevelo

Probably not an iPAD, those microarrays were at least 10 years old.

6

Page 7: NetBioSIG2012 chrisevelo

The problem is not only the long list of resulting genes, but also the

oversampling that occurs. In genomics experiments you typically get large

numbers of false positives at useful levels of significance. Of course false

discovery rate corrections exist but they will usually also loose information.

Pathway or function group (ontology) analysis helps since it is not likely that a

larger set of genes occur as false positives within a smaller functional group.

On the other hand the meaning of pathway statistics should not be

overestimated There are many aspects in real biology and in the way the

groups are build that influence the statistical outcome.

For instance when you have two metabolic reactions where one is catalyzed

by a single enzyme and the other by 4. Are all enzymes of the same

importance? Or are the four together as important as the single one? Or are 3

of the 4 not important in reality and the other one is? All these situations can

occur and the statistics just doesn’t know.

Also suppose you 10 non-regulated genes to a pathway. That will change

significance of your result, but it doesn’t change the biology behind it.

7

Page 8: NetBioSIG2012 chrisevelo

Example of a pathway that can be used for the purposes described.

Page 9: NetBioSIG2012 chrisevelo

A closer look at the same pathway.

Note that this uses MIM notation from the MIM PathVisio plugin.

In general the connections between different genes and metabolites describe

the network underlying the pathway. Note that this is already quite complex

since there are different ways to show what interacts with what.

Graphical methods to capture this like MIM and SBGN definitely help. The

result can be captures in descriptive relationships in BioPax,

9

Page 10: NetBioSIG2012 chrisevelo

10 10 10

Page 11: NetBioSIG2012 chrisevelo

PathVisio can do a combined visualization of different omics results. Here

proteomics and transcriptomics both shown on the same gene product boxes.

It can also show effects from metabolomics.

Page 12: NetBioSIG2012 chrisevelo

12 12

Examples of pathways like we have them on wikipathways.org

Page 13: NetBioSIG2012 chrisevelo

13

This talk is not really about WikiPathways. Check out the information in the

paper or the information on the wiki itself. (www.wikipathways.org) developer

information is mainly on the www.pathvisio.org website.

Page 14: NetBioSIG2012 chrisevelo

14

You obtain microarray data (e.g. affymetrix)

You can visualize micorarray data

Each color corresponds to a measured datapoint

For example, green is up, red is down, grey is constant

And now? How do you make sure the Affymetrix probeset IDs related to the

measurements can be mapped to the gene products in the pathway?

Page 15: NetBioSIG2012 chrisevelo

15

On WikiPathways (or in pathvisio) you can attach identifiers to each gene. A

click opens up the corresponding page on (this specific case) the worm

database.

You can download the corresponding transcript sequence in two clicks

This makes it for instance really easy to design primers

Page 16: NetBioSIG2012 chrisevelo

As soon as you have entered one (and only one) identifier to describe what

gene product or metabolite you really mean this information is linked to many

other identifiers from other databases and links to these respective pages are

shown in the so called “backpage” (actually one of the pages under the tabs at

the righthand side of the pathway).

16

Page 17: NetBioSIG2012 chrisevelo

BridgeDB (see www.bridgedb.org and the paper mentioned on the slide)

provides the mechanism needed for that identifier mapping.

17

Page 18: NetBioSIG2012 chrisevelo

Pathways can be downloaded to be used in different tools.

There is also a wikipathway webservice. See:

http://www.wikipathways.org/index.php/Help:WikiPathways_Webservice

Thomas Kelder, Alexander R Pico, Kristina Hanspers, Chris Evelo & Bruce R

Conklin. Mining biological pathways using WikiPathways web services.

PLoS One (2009) 4: 7 e644. http://dx.doi.org/10.1371/journal.pone.0006447

We also have semantic output in RDF which can be queried through a

SPARQL endpoint described at semantics.bigcat.unimaas.nl.

Page 19: NetBioSIG2012 chrisevelo

Introducing a problem

19

Page 20: NetBioSIG2012 chrisevelo

And a solution that isn’t really a solution. There are just too many things you

could add.

20

Page 21: NetBioSIG2012 chrisevelo

The PathVisio Regulatory Interaction plugin (author Stefan van Helden) has a

new approach where information is not really added to a pathway, but shown

in a separate page upon request.

21

Page 22: NetBioSIG2012 chrisevelo

22

The plugin can be found here:

http://chianti.ucsd.edu/cyto_web/plugins/displayplugininfo.php?name=GPML-Plugin

It can be used to read and write gpml pathway files used by WikiPathways and

PathVisio in Cytoscape

Page 23: NetBioSIG2012 chrisevelo

23

Example showing some more advanced usage of the GPML plugin.

Data from the NuGO proof of principle study with dietary challenged mice.

Three tissues were sampled and in the other two tissues relatively many

genes showed expression changes on Affymetrix arrays but not many

pathways were found.

For liver the number of genes affected was lower but the number of pathways

found to be affected was found to be higher (how come)?

The pathway based network analysis showed that there was a set of stronger

affected pathway (more reguated genes, large blue circles) that share

regulated genes (the red diamonds). When looking at the highlighted group of

pathways it became clear that these all belong to the same superste of

biologically relevant pathways (fatty acid metabolism and inflammation).

Page 24: NetBioSIG2012 chrisevelo

24

A paper that we published with a more extensive pathway relationship

approach. It takes into account relations between pathways through affected

genes not necessarily showing up in either pathway.

Page 25: NetBioSIG2012 chrisevelo

25

Page 26: NetBioSIG2012 chrisevelo

The approach takes into account all data use (pathways, interactions and

experimentally determined weight). Check out the original paper for details.

26

Page 27: NetBioSIG2012 chrisevelo

Example result. Pathways with stronger interaction based on gene snot

present in them.

27

Page 28: NetBioSIG2012 chrisevelo

And you can do the same for relatively large sets of pathways “driving” a

process like apoptosis.

28

Page 29: NetBioSIG2012 chrisevelo

CyTargetLinker is a Cytoscape plugin that can be used to extend one network

with information about things targeting entities in that network from databases

that are created as a network. It already provides a number of target relation

databases as mentioned on the slide.

29

Page 30: NetBioSIG2012 chrisevelo

Example of a target network. (You will normally see this, it contains the

information that is used to extend your source network).

30

Page 31: NetBioSIG2012 chrisevelo

31

And a more detailed view.

Page 32: NetBioSIG2012 chrisevelo

You can drive it from a gene set, that isn’t even a network at the start. But

when miRNAs are found to target more than one gene in the ggroup the

network is created on the fly.

32

Page 33: NetBioSIG2012 chrisevelo

Or you can bootstrap the approach from an existing network. Which can be a

pathway based one imported with the GPML plugin like shown here.

33

Page 34: NetBioSIG2012 chrisevelo

An overview of the Open Phacts project that pulls in lots of information in a

semantic web triple store (including information from WikiPathways RDF) and

then provides that for use in other tools. In WikiPathways we use that to

suggest possible pathway extensions to curators

34

Page 35: NetBioSIG2012 chrisevelo

This show the PathVisio Loom plugin in action. A gene or metabolite in a

pathway under development (left side) is right clicked and the LOOM is

activated to pull related genes or metabolites from another resource

(database, text mining result or Open Phacts API). The suggested interactions

are shown in the window on the right and the entities are added to the pathway

(two already shown on the left).

Page 36: NetBioSIG2012 chrisevelo

Talk so far focused on the genomics-knowledge relationship shown on the

right, So what about genetics?

36

Page 37: NetBioSIG2012 chrisevelo

37

Page 38: NetBioSIG2012 chrisevelo

38

This is the image was to us by Jim Kaput (at that time NTCR, now

Nestle).”Look people group those SNPs in gene groups, made sense of the

directions and showed them in a pathway. Can you do something like that?”

Page 39: NetBioSIG2012 chrisevelo

In principle? Yes.

39

Page 40: NetBioSIG2012 chrisevelo

There are just too many SNPs for any given gene.

40

Page 41: NetBioSIG2012 chrisevelo

So it would really look like a bunch of jellies if we show these all on the genes

in a pathway, and you would not know what they mean.

41

Page 42: NetBioSIG2012 chrisevelo

There are loads of bioinformatics tools out there (like Sift and Polyphen) that

allow us to estimate functional effects of SNPs on coded protein (activity or

protein-protein interactions), binding site for transcription factors in the DNA, or

miRNA in RNA. Doing that we can decide what edges SNPs would affect (and

how much in what direction). Now as soon as you do that you can use the

result to strengthen SNP statistics (ie create groups that can be used for

supervised types of group based GWAS analysis) or to build predictive models

to estimate that specific (personal or tissue/tumor based) sets of variations

would do. That provides a need to use the pathways to link experimental

(genomics) data not only to the genetic variations occurring in there, but also

to modeling results

42

Page 43: NetBioSIG2012 chrisevelo

Showing the concept. Integrating flux predictions from modelling (of course

that could also be real fluxomics data)

43

Page 44: NetBioSIG2012 chrisevelo

44

And showing “real” results from the new flux data representation plugin.

The plugin is functional but we still need better mapping databases for reaction

identifiers

Page 45: NetBioSIG2012 chrisevelo

Many people involved in this work. (Really many if you count associated

groups like the plugin developers, pathway curators etc).

Most important

SF group (Kristina Hanspers, Bruce Conklin and Alex Pico) collaborating on

many things but primarily WikiPatwhays

Martijn van Iersel top left (PathVisio, BridgeDB). Thomas Kelder (top middle)

(WikiPathways including webservices, pathway integration networks for

nutrigenomics), Martina Kutmon (top right) (CyTargetLinker, PathVisio further

development), Andra Waagmeester (second row, right) (WikiPathways RDF),

Anwesha Dutta (bottom, 2nd from the left) (flux visualization), Stefan van

Helden (not on the picture) for the RI PathVisio plugin

45