bits - genevestigator to easily access transcriptomics data
DESCRIPTION
These are the presentation slides of the BITS training session about 'Genevestigator'.Many thanks to Nebion for contributing these slides.TRANSCRIPT
GENEVESTIGATOR TUTORIAL
VIB - Gent
12.04.2011
1
Goals
Understand what Genevestigator is and why it has been developed
Understand the function of the tools provided by the software
Learn how to use Genevestigator to find genes of interest
2
Content
Microarray technology
Concept of Genevestigator
Data curation
Tools:– Meta-profile analysis– Biomarker search– RefGenes– Clustering analysis
Microarray technology
Advantages:– Genome wide– Relatively cheap– Standardized streamlined handling– Use of an optimized system based on oligonucleotide sequences– Possibility to store data in publicly available repositories
Disadvantages:– Sequence must be known in advance– Hybridization reaction
Workflow of a microarray experiment
Each pixel intensity is determined by the expression level of a gene in the specific sample hybridized on the array
Raw Data (Probe level)
Quality Control
Normalization
Normalized Data
Analysis
Validation (Q-PCR)
5
Submission to repository
HybridizationConditions selection and experiments
RNA extraction, amplification and labelling
Hybridization on chips
DAT fileScanned raw image
CEL file
TXT file
Concept of Genevestigator
6
Thousands of microarrayexperiments exist world-wide
Tissue type 1Tissue type 2Tissue type 3Tissue type 4……………Tissue type 200
=> Summarize information from thousands of public experiments into easily interpretable results
Model of asummarized output
Concept of Genevestigator
7
meta-analysis?
Data repositories
Dataqualitycontrol
Expert annotationwith systematic
ontologies
anatomy
development
condition
genotype
Curation
meta-analysis!
Genevestigator
Build a systematic database of gene expression information
8
1. Data Curation - Overview
Quality control all sample data
Collect raw data files and normalize data
Read and understand the experiment
Manually annotate experiments using structured vocabularies (ontologies)
Final goal of curation: translate experimental information in computer-readable and „statistically usable“ form
Quality control+
Normalization
1. Data Curation
Expert annotationwith systematic
ontologies
anatomydevelopment
conditiongenotype
9
Curation: Quality control
Unprocessed probe intensity
RNA degradation plots
Probe-level analysis (RLE, NUSE)
Border element analysis
Array-array correlation plots
10
Curation: normalization models
Multi-array models– e.g. dChip, RMA, gcRMA– all arrays from an experiment are normalized simultaneously– cannot easily be used to create large databases– RMA and gcRMA use perfect-match information only (background estimation by
statistical approaches)
Single array models– e.g. MAS5– normalize each array independantly– does not correct for biases between experiments– MAS5 uses both perfect-match and mismatch probe information
(mismatch is used to model background (biochemical approach))
11
Curation: Ontologies
Ontologies built for– Anatomical parts
– Stages of development
– Perturbations (diseases, chemicals, etc.)
Ontologies– Were compiled from various public ontology
sources and own developments
– Are built using tree structures
Anatomy Ontology:- Arabidopsis- Rice - Barley
(version 2008)
DevelopmentOntology:- Mouse
12
12
Curation: Meta-profiles
sample meta-data
expression data
[space] [time] [response] [response]
summarizedresults
13
Curation: Data content
As of December 2010: > 54’000 Affymetrix arrays Total 1’742 54’786
World’s largest standardized, quality controlled, and manually annotated gene expression compendium for plants, animals, and microorganisms!
14
Genevestigator application
Database and analysis engine
Website with user support
Analysis tool for the user
Browser– Genevestigator works in Internet Explorer,
Firefox, Safari, Opera, and Chrome
Java– Sun Microsystems; Minimal: Java 1.4.2. or
higher
Computer:– 500 MB RAM or more
Requirements
15
Toolsets
16
Analytical approach 1
genes
Anatomy[space]
Development[time]
Condition /Genotype[response]
which conditions?
17
Meta-Profile Analysis
1. Choose an organism
2. Enter the genes you wish to work with
18
Meta-Profile Analysis tools
View and interpret the results across:– Anatomical categories (Anatomy tab)– Developmental stages (Development tab)– Chemicals, diseases, tumors, etc. (Conditions tab)– Genetic modifications (Genotype tab)– Tumors (Neoplasm tab, only for Human)
19
Note: Select by experiment or annotation
20
Meta-Profile Analysis: Anatomy tool
Looks at how genes are expressed in different tissues
Mean and standard deviation
Anatomy categories as a tree (ontology); expand / collapse
Number of arrays per category is indicated
21
Meta-Profile Analysis: Neoplasm tool
Looks at how genes are expressed in different tumors
Clinical parameters of the tumors are available
Mean and standard deviation
Anatomy categories as a tree (ontology); expand / collapse
Number of arrays per category is indicated
Expression profile of NPY across different tumor types
22
Meta-Profile Analysis: Development tool
Looks at how genes are expressed during the life cycle of an organism
Example for barley
Example for mouse / rat
23
Meta-Profile Analysis: Conditions and Genotype tools
List (or tree)of variousconditions
Spots indicate theresponses of selectedgene(s) to the list of conditions
Most upregulating conditions
Most downregulating conditions
24
Meta-Profile Analysis: Scanner tool
All arrays are represented on a single screen
Easily find and select experiments in which expression is particularly high (screen for peaks)
Magnifying glass and tooltip allow to look into details of signals, arrays, and experiments.
25
Meta-Profile Analysis: Samples tool
All arrays are represented in a single plot, scroll down
Look at expression level and “absent / present” calls
Tooltips allow to look into details of arrays and experiments.
26
Analytical approach 2
Anatomy[space]
Development[time]
Conditions /Genotypes[response]
conditions which genes?
27
Biomarker search
1. Choose an organism
3. Save target genes for further analysis
2. Choose conditions and run analysis
28
Biomarker Search
Identify genes that exhibit specific expression characteristics
Anatomy
Development
Conditions / Genotype
29
Classical biomarker search
Most biomarker search approaches look for the genes, which respond the most to a given condition
This condition may include multiple similar studies
How these genes respond to other conditions is unknown, because they were not included into the analysis
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
con
dit
ion
1
con
dit
ion
2
con
dit
ion
3
con
dit
ion
4
con
dit
ion
5
con
dit
ion
6
con
dit
ion
7
con
dit
ion
8
con
dit
ion
9
co
nd
itio
n 1
0
con
dit
ion
11
con
dit
ion
12
co
nd
itio
n 1
3
co
nd
itio
n 1
4
con
dit
ion
15
con
dit
ion
16
co
nd
itio
n 1
7
??
30
Biomarker validation in Genevestigator
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
con
dit
ion
1
con
dit
ion
2
con
dit
ion
3
con
dit
ion
4
con
dit
ion
5
con
dit
ion
6
con
dit
ion
7
con
dit
ion
8
con
dit
ion
9
co
nd
itio
n 1
0
con
dit
ion
11
con
dit
ion
12
co
nd
itio
n 1
3
co
nd
itio
n 1
4
con
dit
ion
15
con
dit
ion
16
co
nd
itio
n 1
7
Genevestigator allows to find out how specific these genes are (Meta-Profile Analysis -> Stimulus/Mutation tools)
Only few are responsive only to condition 9 (black arrows). All others are sensitive to one (grey arrows) or more other conditions.
31
Biomarker Search in Genevestigatorco
nd
itio
n 1
con
dit
ion
2
con
dit
ion
3
con
dit
ion
4
con
dit
ion
5
con
dit
ion
6
con
dit
ion
7
con
dit
ion
8
con
dit
ion
9
co
nd
itio
n 1
0
con
dit
ion
11
con
dit
ion
12
co
nd
itio
n 1
3
co
nd
itio
n 1
4
con
dit
ion
15
con
dit
ion
16
co
nd
itio
n 1
7
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
The Genevestigator Biomarker Search tools identify genes that are specifically responsive to the chosen condition (they respond minimally to other conditions).
These genes are not necessarily the ones with the strongest response to the chosen condition
The Genevestigator Biomarker Search tools usually find other target candidates than classical tools, which analyze only a subset of experiments
32
Biomarker Search in Genevestigator
Imagine extending this to a much wider set of conditions…– you may find other conditions to which the set of genes respond
co
nd
itio
n 1
con
dit
ion
2
con
dit
ion
3
con
dit
ion
4
co
nd
itio
n 5
con
dit
ion
6
co
nd
itio
n 7
co
nd
itio
n 8
con
dit
ion
9
co
nd
itio
n 1
0
con
dit
ion
11
con
dit
ion
12
co
nd
itio
n 1
3
co
nd
itio
n 1
4
con
dit
ion
15
con
dit
ion
16
co
nd
itio
n 1
7
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
co
nd
itio
n 1
8
co
nd
itio
n 1
9
co
nd
itio
n 2
0
co
nd
itio
n 2
1
con
dit
ion
22
con
dit
ion
23
co
nd
itio
n 2
4
co
nd
itio
n 2
5
con
dit
ion
26
con
dit
ion
27
con
dit
ion
28
co
nd
itio
n 2
9
co
nd
itio
n 3
0
co
nd
itio
n 3
1
co
nd
itio
n 3
2
co
nd
itio
n 3
3
con
dit
ion
34
co
nd
itio
n 3
5
co
nd
itio
n 3
6
con
dit
ion
37
co
nd
itio
n 3
8
co
nd
itio
n 3
9
co
nd
itio
n 4
0
co
nd
itio
n 4
1
con
dit
ion
42
con
dit
ion
43
co
nd
itio
n 4
4
con
dit
ion
45
con
dit
ion
46
co
nd
itio
n 4
7
con
dit
ion
48
co
nd
itio
n 4
9
co
nd
itio
n 5
0
co
nd
itio
n 5
1
con
dit
ion
52
co
nd
itio
n 5
3
co
nd
itio
n 5
4
co
nd
itio
n 5
5
con
dit
ion
56
co
nd
itio
n 5
7
con
dit
ion
58
con
dit
ion
59
co
nd
itio
n 6
0
con
dit
ion
61
con
dit
ion
62
co
nd
itio
n 6
3
co
nd
itio
n 6
4
co
nd
itio
n 6
5
co
nd
itio
n 6
6
con
dit
ion
67
co
nd
itio
n 6
8
co
nd
itio
n 6
9
con
dit
ion
70
co
nd
itio
n 7
1
co
nd
itio
n 7
2
con
dit
ion
73
con
dit
ion
74
co
nd
itio
n 7
5
target condition
other conditions to which the genes are responding
33
Biomarker Search: example
Search for genes that are associated with a set of conditions, e.g. how do abiotic stresses relate to hormonal responses?
hormonalresponses
abiotic stresses
ABA (+)
salt (+)osmotic (+)
---
salt (-)osmotic (-)
ABA (+)
salt (+)osmotic (+)
cold (+)
MeJA (+)
salt (+)drought (+)
BL / H3BO3(+)
anoxia (-)hypoxia (-)
ethylene (+)
hypoxia (-)
34
Biomarker Search in Genevestigator
Example: human genes responsive to Actinomycin-D
Actinomycin-D
Cell cycle inhibition
Echinomycin
Chemical: ARC
SapphyrinPropiconazoleOncolytic herpessimplex virus
vMyb
target condition(s)
co-inducing conditions
35
RefGenes
Goal: identify reference genes for use in qPCR.
Solution: search the Genevestigator database for genes that show constant expression in a certain category of arrays.
36
RefGenes: validation experiment with mouse liver
Validation experimenton mouse liver
geNorm selection of the moststable reference genes within
this experiment
Dataset: 197 arrays from mouse liver
37
Clustering Analysis
Goal: to identify groups of genes that have similar expression characteristics
Tools:– Hierarchical clustering (with leaf
ordering)– Biclustering (BiMax algorithm)
38
Biclustering
Search for biclusters in a list of 64 genes responsive to myocardial infarction
One of many possible biclusters Development profile of these 7 genes
39
Advantages of using Genevestigator
Benefit from the normalized data from 54’000 arrays on 12 organisms
Extended and precise gene search according to:
- Anatomy- Development- Stimulus / Mutation
Find genes, which might be interesting for a further study
Gain further information about specific gene sets
Find appropriate reference genes for the conditions you study
Rapidly compare, validate and extend data
QUESTIONS?
Supplementary Slides
42
Select Genes
43
Problems with classical reference genes
Most groups use common housekeeping genes such as β-Actin or GAPDH to normalize qPCR data
Depending on the condition studied, these genes show some regulations and are therefore unsuitable
Hypothesis: for each biological context, there is a subset of genes that are most suitable to normalize expression data from this context.
44
Summary
Affymetrix GeneChip®
Scan
Affymetrix GeneChip® scanned image
46
Each pixel intensity is determined by the expression level of a gene in the specific sample hybridized on the array
DAT fileScanned raw image
CEL file
Raw Data (Probe level)
Quality Control
Normalization
Normalized Data
Into repository
TXT file