evidence networks for the analysis of biological systems rainer breitling ibls – molecular plant...

Evidence networks for the analysis of biological systems

Rainer BreitlingIBLS – Molecular Plant Science group

Bioinformatics Research CentreUniversity of Glasgow, Scotland, UK

Background

Datasets and evidence networks in post-genomic

biology

GenomicsFully sequenced genomes (1995-2004):

18 archaea

163 bacteria

3 protozoa

24 yeast species and fungi

2 plants (Arabidopsis, rice)

2 insects (flies, honey bee)

2 worms (C.elegans, C. briggsae)

3 fish (fugu, puffer, zebrafish)

chicken, cow, dog, mouse, rat, chimp

human

lots of “lists” of genes

Transcriptomics•microarrays measure gene expression levels (mRNA concentrations)

•relative or absolute values

•in organisms, tissues, cells

•produce gene lists (e.g., which genes are up-regulated by a disease, by drug treatment, in a certain tissue)

Proteomics•2D gels, liquid chromatography, and mass spectrometry measure protein concentrations

•in tissues, cells, organelles

•detect chemical modifications and processing of proteins

•produces lists of protein variants that are different among conditions

Metabolomics•chromatography and mass spectrometry measure metabolite concentrations

•in tissues, cells, body fluids, cell culture medium

•produces lists of affected metabolites

Evidence networks

• relate items (genes, proteins, metabolites) that “have something to do with each other”

• relationship is based on objective evidence

• represented as bipartite graphs– two classes of nodes: items and evidence – automated analysis of results possible– intuitive visualization and links to literature

Types of evidence networks

• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations



A O M P K Z Y Q V D R L B C E F G H S N U J X I T W

phy: a o m p k z y - - d - l - - - - - - - - - - - i t –

22 aompkzy--d-l-----------it- NtpA [C] H+-ATPase subunit A

17 aompkzy--d-l-----------it- NtpB [C] H+-ATPase subunit B

17 aompkzy--d-l-----------it- NtpD [C] H+-ATPase subunit D

18 aompkzy--d-l-----------it- NtpI [C] H+-ATPase subunit I

http://www.ncbi.nlm.nih.gov/COG/old/phylox.html

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=-ompkzy--d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=a-mpkzy--d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=ao-pkzy--d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aom-kzy--d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aomp-zy--d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompk-y--d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkz---d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzyq-d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy-vd-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy----l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--drl-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-------------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-lb----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l-c---------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l--e--------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l---f-------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l----g------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l-----h-----it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l------s----it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l-------n---it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l--------u--it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l---------j-it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l----------xit-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l------------t-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l-----------i--

http://www.ncbi.nlm.nih.gov/COG/old/aln/COG1155.aln

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l-----------it-

http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?phy=aompkzy--d-l-----------it-

What is the big picture?

Graph-based iterative Group Analysis for the

automated interpretation of biological datasetslists + graphs = understanding

What does this list mean? Fold-Change Gene Symbol Gene Title

1 26.45 TNFAIP6 tumor necrosis factor, alpha-induced protein 6

2 25.79 THBS1 thrombospondin 1

3 23.08 SERPINE2serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor

type 1), member 2

4 21.5 PTX3 pentaxin-related gene, rapidly induced by IL-1 beta

5 18.82 THBS1 thrombospondin 1

6 16.68 CXCL10 chemokine (C-X-C motif) ligand 10

7 18.23 CCL4 chemokine (C-C motif) ligand 4

8 14.85 SOD2 superoxide dismutase 2, mitochondrial

9 13.62 IL1B interleukin 1, beta



12 11.27 SOD2 superoxide dismutase 2, mitochondrial

13 10.89 GCH1 GTP cyclohydrolase 1 (dopa-responsive dystonia)

14 10.73 IL8 interleukin 8

15 9.98 ICAM1 intercellular adhesion molecule 1 (CD54), human rhinovirus receptor

16 9.97 SLC2A6 solute carrier family 2 (facilitated glucose transporter), member 6

17 8.36 BCL2A1 BCL2-related protein A1

18 7.33 TNFAIP2 tumor necrosis factor, alpha-induced protein 2

19 6.97 SERPINB2 serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 2

20 6.69 MAFB v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian)

iterative Group Analysis (iGA)

iGA uses simple hypergeometric distribution to obtain p-values

Breitling et al., BMC Bioinformatics, 2004, 5:34

Graph-based iGA


Graph-based iGA1. step: build the network


Graph-based iGA2. step: assign ranks to genes


Graph-based iGA3. step: find local minima

p = 1/8 = 0.125

p = 2/8 = 0.25

p = 6/8 = 0.75


Graph-based iGA4. step: extend subgraph from minima

p=1

p=0.014 p=0.018

p=0.125


Graph-based iGA5. step: select p-value minimum

p=1

p=0.018

p=0.125

p=0.014


Advantages of GiGA

• fast, unbiased and comprehensive analysis• assignment of statistical significance values to

interpretation• detection of significant changes even if data are

too noisy to reliably detect changed genes• statistically meaningful interpretation already

without replication experiments• detection of patterns even for small absolute

changes• flexible use of annotations + intuitive

visualization

Example 1

Microarrays

Gene expression changes during the yeast diauxic shift

Yeast diauxic shift studyDeRisi et al. (1997)Science 278: 680-6

Yeast diauxic shift study 0h 9.5h 11.5h 13.5h 15.5h 18.5h 20.5h

UP 6144 - purine base metabolism

6099 - tricarboxylic acid cycle


3773 - heat shock protein activity


9277 - cell wall (sensu Fungi)


5749 - respiratory chain complex II (sensu Eukarya)



297 - spermine transporter activity

6950 - response to stress

6121 - oxidative phosphorylation, succinate to ubiquinone

5977 - glycogen metabolism


15846 - polyamine transport

297 - spermine transporter activity

8177 - succinate dehydrogenase (ubiquinone) activity



4373 - glycogen (starch) synthase activity




15846 - polyamine transport


4129 - cytochrome c oxidase activity

6537 - glutamate biosynthesis

5353 - fructose transporter activity

7039 - vacuolar protein catabolism

5751 - respiratory chain complex IV (sensu Eukarya)

6097 - glyoxylate cycle

15578 - mannose transporter activity



5750 - respiratory chain complex III (sensu Eukarya)

7039 - vacuolar protein catabolism



9060 - aerobic respiration

8645 - hexose transport

5751 - respiratory chain complex IV (sensu Eukarya)



GiGA results – diauxic shift

Down-regulated genes using GeneOntology-based network

locus gene description ("anchor gene") p-value members max. rank

YHL015W ribosomal protein S20 5.87E-86 39 48

YMR217W GMP synthase 3.38E-13 9 172

YDR144C aspartyl protease|related to Yap3p 4.06E-08 6 242

YNL065W multidrug resistance transporter 4.02E-05 3 141

YLR062C 6.41E-05 4 367

YGL225W May regulate Golgi function and glycosylation in Golgi 1.12E-04 4 422

YPR074C transketolase 1 1.44E-04 4 449

total genes measured in network: 4087.

smallribosomalsubunit

large

ribosomal

subunit

nucleolarrRNAprocessing

translationalelongation

GiGA case study – diauxic shift

Up-regulated genes using metabolic network

locus gene description p-value members max. rank

YER065C isocitrate lyase 4.96E-53 39 54

YGR088W catalase T 3.09E-10 11 106

YFR015Cglycogen synthase (UDP-glucose-starch glucosyltransferase)

2.08E-04 3 45

YJR073C unsaturated phospholipid N-methyltransferase 3.85E-04 5 156

YDR001C neutral trehalase 5.01E-04 3 60

YCR014C DNA polymerase IV 5.44E-04 17 481

YIR038C glutathione transferase 8.64E-04 5 183

total genes measured in network: 744.

glyoxylate

cycle

citrate (TCA) cycle

oxidative phosphorylation

(complex V)

respiratory chaincomplex III

respiratory chaincomplex II

respiratory chaincomplex IV

Example 2

Metabolomics

Changes in metabolic profiles in drug-treated

trypanosomes

GiGA applied to metabolomics data

• Challenge: No annotation available

• Solution: Build evidence network based on hypothetical reactions between observed masses (=mass differences)

Metabolite tree of mass 257.1028 (glycerylphosphorylcholine)

6 generations

Metabolite tree of mass 257.1028

4 generations


2 generations


colors indicate changes of metabolite signals compared to untreated samples after 60 min pentamidine (red = down, green = up)

GiGA metabolite trees for one experimental example

Choline tree found by GiGA(most significant subgraph, p<10-13)

extracted from

Summary• post-genomic technologies produces “lists”• neighborhood relationships yield “evidence

networks (graphs)• lists + graphs = biological insights• GiGA graph analysis highlights and connects

relevant areas in the “evidence network”

Acknowledgements

• Pawel Herzyk – Sir Henry Wellcome Functional Genomics Facility

• Anna Amtmann & Patrick Armengaud – IBLS Molecular Plant Science group

• Mike Barrett – IBLS Parasitology Research group • FGF academic users: Wilhelmina Behan, Simone Boldt,

Anna Casburn-Jones, Gillian Douce, Paul Everest, Michael Farthing, Heather Johnston, Walter Kolch, Peter O'Shaughnessy, Susan Pyne, Rosemary Smith, Hawys Williams

http://www.gla.ac.uk/

http://www.gla.ac.uk/ibls/

Contact

Rainer Breitling

Bioinformatics Research Centre

Davidson Building A416

University of Glasgow, Scotland, UK

[email protected]

http://www.brc.dcs.gla.ac.uk/~rb106x

mailto:[email protected]

http://www.brc.dcs.gla.ac.uk/~rb106x

evidence networks for the analysis of biological systems rainer breitling ibls – molecular plant...

Documents

ntpd c h atpase subunit

ntpa c h atpase subunit

ntpi c h atpase subunit

ntpb c h atpase subunit

cellsproduce gene lists

z y q v d r

honey bee2 worms c

b c e f g h s n u j