evidence networks for the analysis of biological systems rainer breitling ibls – molecular plant...
TRANSCRIPT
Evidence networks for the analysis of biological systems
Rainer BreitlingIBLS – Molecular Plant Science group
Bioinformatics Research CentreUniversity of Glasgow, Scotland, UK
Background
Datasets and evidence networks in post-genomic
biology
GenomicsFully sequenced genomes (1995-2004):
18 archaea
163 bacteria
3 protozoa
24 yeast species and fungi
2 plants (Arabidopsis, rice)
2 insects (flies, honey bee)
2 worms (C.elegans, C. briggsae)
3 fish (fugu, puffer, zebrafish)
chicken, cow, dog, mouse, rat, chimp
human
lots of “lists” of genes
Transcriptomics•microarrays measure gene expression levels (mRNA concentrations)
•relative or absolute values
•in organisms, tissues, cells
•produce gene lists (e.g., which genes are up-regulated by a disease, by drug treatment, in a certain tissue)
Proteomics•2D gels, liquid chromatography, and mass spectrometry measure protein concentrations
•in tissues, cells, organelles
•detect chemical modifications and processing of proteins
•produces lists of protein variants that are different among conditions
Metabolomics•chromatography and mass spectrometry measure metabolite concentrations
•in tissues, cells, body fluids, cell culture medium
•produces lists of affected metabolites
Evidence networks
• relate items (genes, proteins, metabolites) that “have something to do with each other”
• relationship is based on objective evidence
• represented as bipartite graphs– two classes of nodes: items and evidence – automated analysis of results possible– intuitive visualization and links to literature
Types of evidence networks
• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations
Types of evidence networks
• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations
A O M P K Z Y Q V D R L B C E F G H S N U J X I T W
phy: a o m p k z y - - d - l - - - - - - - - - - - i t –
22 aompkzy--d-l-----------it- NtpA [C] H+-ATPase subunit A
17 aompkzy--d-l-----------it- NtpB [C] H+-ATPase subunit B
17 aompkzy--d-l-----------it- NtpD [C] H+-ATPase subunit D
18 aompkzy--d-l-----------it- NtpI [C] H+-ATPase subunit I
Types of evidence networks
• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations
Types of evidence networks
• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations
Types of evidence networks
• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations
Types of evidence networks
• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations
Types of evidence networks
• Relationship can be based on– physical neighborhood– phyletic pattern similarity– expressional correlation– biophysical similarity– chemical transformation– functional co-operation– literature co-citations
What is the big picture?
Graph-based iterative Group Analysis for the
automated interpretation of biological datasetslists + graphs = understanding
What does this list mean? Fold-Change Gene Symbol Gene Title
1 26.45 TNFAIP6 tumor necrosis factor, alpha-induced protein 6
2 25.79 THBS1 thrombospondin 1
3 23.08 SERPINE2serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor
type 1), member 2
4 21.5 PTX3 pentaxin-related gene, rapidly induced by IL-1 beta
5 18.82 THBS1 thrombospondin 1
6 16.68 CXCL10 chemokine (C-X-C motif) ligand 10
7 18.23 CCL4 chemokine (C-C motif) ligand 4
8 14.85 SOD2 superoxide dismutase 2, mitochondrial
9 13.62 IL1B interleukin 1, beta
10 11.53 CCL20 chemokine (C-C motif) ligand 20
11 11.82 CCL3 chemokine (C-C motif) ligand 3
12 11.27 SOD2 superoxide dismutase 2, mitochondrial
13 10.89 GCH1 GTP cyclohydrolase 1 (dopa-responsive dystonia)
14 10.73 IL8 interleukin 8
15 9.98 ICAM1 intercellular adhesion molecule 1 (CD54), human rhinovirus receptor
16 9.97 SLC2A6 solute carrier family 2 (facilitated glucose transporter), member 6
17 8.36 BCL2A1 BCL2-related protein A1
18 7.33 TNFAIP2 tumor necrosis factor, alpha-induced protein 2
19 6.97 SERPINB2 serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 2
20 6.69 MAFB v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian)
iterative Group Analysis (iGA)
iGA uses simple hypergeometric distribution to obtain p-values
Breitling et al., BMC Bioinformatics, 2004, 5:34
Graph-based iGA
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA1. step: build the network
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA2. step: assign ranks to genes
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA3. step: find local minima
p = 1/8 = 0.125
p = 2/8 = 0.25
p = 6/8 = 0.75
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA4. step: extend subgraph from minima
p=1
p=0.014 p=0.018
p=0.125
Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA5. step: select p-value minimum
p=1
p=0.018
p=0.125
p=0.014
Breitling et al., BMC Bioinformatics, 2004, 5:100
Advantages of GiGA
• fast, unbiased and comprehensive analysis• assignment of statistical significance values to
interpretation• detection of significant changes even if data are
too noisy to reliably detect changed genes• statistically meaningful interpretation already
without replication experiments• detection of patterns even for small absolute
changes• flexible use of annotations + intuitive
visualization
Example 1
Microarrays
Gene expression changes during the yeast diauxic shift
Yeast diauxic shift studyDeRisi et al. (1997)Science 278: 680-6
Yeast diauxic shift study 0h 9.5h 11.5h 13.5h 15.5h 18.5h 20.5h
UP 6144 - purine base metabolism
6099 - tricarboxylic acid cycle
6099 - tricarboxylic acid cycle
3773 - heat shock protein activity
6099 - tricarboxylic acid cycle
9277 - cell wall (sensu Fungi)
3773 - heat shock protein activity
5749 - respiratory chain complex II (sensu Eukarya)
6099 - tricarboxylic acid cycle
3773 - heat shock protein activity
297 - spermine transporter activity
6950 - response to stress
6121 - oxidative phosphorylation, succinate to ubiquinone
5977 - glycogen metabolism
5749 - respiratory chain complex II (sensu Eukarya)
15846 - polyamine transport
297 - spermine transporter activity
8177 - succinate dehydrogenase (ubiquinone) activity
6950 - response to stress
6121 - oxidative phosphorylation, succinate to ubiquinone
4373 - glycogen (starch) synthase activity
3773 - heat shock protein activity
4373 - glycogen (starch) synthase activity
8177 - succinate dehydrogenase (ubiquinone) activity
15846 - polyamine transport
4373 - glycogen (starch) synthase activity
4129 - cytochrome c oxidase activity
6537 - glutamate biosynthesis
5353 - fructose transporter activity
7039 - vacuolar protein catabolism
5751 - respiratory chain complex IV (sensu Eukarya)
6097 - glyoxylate cycle
15578 - mannose transporter activity
6950 - response to stress
5749 - respiratory chain complex II (sensu Eukarya)
5750 - respiratory chain complex III (sensu Eukarya)
7039 - vacuolar protein catabolism
4129 - cytochrome c oxidase activity
6121 - oxidative phosphorylation, succinate to ubiquinone
9060 - aerobic respiration
8645 - hexose transport
5751 - respiratory chain complex IV (sensu Eukarya)
8177 - succinate dehydrogenase (ubiquinone) activity
4129 - cytochrome c oxidase activity
GiGA results – diauxic shift
Down-regulated genes using GeneOntology-based network
locus gene description ("anchor gene") p-value members max. rank
YHL015W ribosomal protein S20 5.87E-86 39 48
YMR217W GMP synthase 3.38E-13 9 172
YDR144C aspartyl protease|related to Yap3p 4.06E-08 6 242
YNL065W multidrug resistance transporter 4.02E-05 3 141
YLR062C 6.41E-05 4 367
YGL225W May regulate Golgi function and glycosylation in Golgi 1.12E-04 4 422
YPR074C transketolase 1 1.44E-04 4 449
total genes measured in network: 4087.
smallribosomalsubunit
large
ribosomal
subunit
nucleolarrRNAprocessing
translationalelongation
GiGA case study – diauxic shift
Up-regulated genes using metabolic network
locus gene description p-value members max. rank
YER065C isocitrate lyase 4.96E-53 39 54
YGR088W catalase T 3.09E-10 11 106
YFR015Cglycogen synthase (UDP-glucose-starch glucosyltransferase)
2.08E-04 3 45
YJR073C unsaturated phospholipid N-methyltransferase 3.85E-04 5 156
YDR001C neutral trehalase 5.01E-04 3 60
YCR014C DNA polymerase IV 5.44E-04 17 481
YIR038C glutathione transferase 8.64E-04 5 183
total genes measured in network: 744.
glyoxylate
cycle
citrate (TCA) cycle
oxidative phosphorylation
(complex V)
respiratory chaincomplex III
respiratory chaincomplex II
respiratory chaincomplex IV
Example 2
Metabolomics
Changes in metabolic profiles in drug-treated
trypanosomes
GiGA applied to metabolomics data
• Challenge: No annotation available
• Solution: Build evidence network based on hypothetical reactions between observed masses (=mass differences)
Metabolite tree of mass 257.1028 (glycerylphosphorylcholine)
6 generations
Metabolite tree of mass 257.1028
4 generations
Metabolite tree of mass 257.1028
2 generations
Metabolite tree of mass 257.1028
colors indicate changes of metabolite signals compared to untreated samples after 60 min pentamidine (red = down, green = up)
GiGA metabolite trees for one experimental example
Choline tree found by GiGA(most significant subgraph, p<10-13)
extracted from
Summary• post-genomic technologies produces “lists”• neighborhood relationships yield “evidence
networks (graphs)• lists + graphs = biological insights• GiGA graph analysis highlights and connects
relevant areas in the “evidence network”
Acknowledgements
• Pawel Herzyk – Sir Henry Wellcome Functional Genomics Facility
• Anna Amtmann & Patrick Armengaud – IBLS Molecular Plant Science group
• Mike Barrett – IBLS Parasitology Research group • FGF academic users: Wilhelmina Behan, Simone Boldt,
Anna Casburn-Jones, Gillian Douce, Paul Everest, Michael Farthing, Heather Johnston, Walter Kolch, Peter O'Shaughnessy, Susan Pyne, Rosemary Smith, Hawys Williams
Contact
Rainer Breitling
Bioinformatics Research Centre
Davidson Building A416
University of Glasgow, Scotland, UK
http://www.brc.dcs.gla.ac.uk/~rb106x