gene expression analysis using microarrays anne r. haake, ph.d
TRANSCRIPT
How do we relate gene identity to cell physiology, disease & drug discovery?
Functional Genomics
=“development and application of global (genome-wide or system-wide) experimental approaches to assess gene function by making use of the information and reagents provided by structural genomics”
Gene Expression Analysis
• What is gene expression?
• What can we learn from expression analysis?
• How is the analysis accomplished?
• What are the challenges for bioinformatics?
• Individual cells in an organism have the same genes (DNA) but….
• It is the expression of thousands of genes and their products (RNA, proteins), functioning in a complicated and orchestrated way, that make that organism what it is.
Differential Gene ExpressionA Few Examples:• Cell type specific
-e.g. skin cell vs. brain cell • Developmental stage
-e.g. embryonic skin cell vs. adult skin cell
• Disease state -e.g. normal skin cell vs. skin tumor cell
• Environment-specific -e.g. skin cell untreated vs. treated
drugs, toxins
What can we learn by analyzing complex patterns of gene expression?
• Classifications: for diagnosis, prediction…Cell-type, stage-specific, disease-
related, treatment-related patterns of gene expression?
• Gene Networks/Pathways:Functional roles of genes in cellular processes?Gene regulation and gene interactions
Gene Expression AnalysisNeed efficient ways to study these complex patterns.
1) Techniques of Biochemistry/Molecular Biology Resolution of the patterns = expression data
(RNA or protein)
2) Management of complex data sets
3) Mining of the data to gain useful information
Gene Expression Analysis
High-Throughput Techniques
• Microarray or Gene Chip
= cDNA arrays or oligo arrays (Affymetrix)
• Filter Arrays
• Differential Display
• SAGE
Gene Chip technology • DNA microarrays = hundreds to thousands of different
DNA sequences spotted onto glass microscope slide
• Compare binding (base-pairing) of two different sets of expressed gene sequences to the template DNA microarray
• Allows simultaneous analysis of thousands of genes: Is the gene expressed? At what level?
*expression levels are relative
The Full Yeast Genome on a Chip6116 Yeast Genes96 Intergenic regions
+ lots of control samples– Primers purchased from Research Genetics
• Total spots printed:707,520• Total Arrays:110• Actual Time to print:52 hours• Credits: Dr. Patrick O. Brown laboratory:
Outcomes of Microarray Analysis
• Size and complexity of the problem– Example:
20,000 genes from 10 samples under 20 different conditions - 4,000,000 pieces of data
challenges for Bioinformatics
Outcomes of Microarray Analysis
• Large, complex data sets
• Wide availability of technology large number of distributed databases
Current state: data scattered among many independent sites (accessible via Internet) or not publicly available at all.
Current Problems Facing Bioinformatics
• Standardization & Quality Control In the Experiments (data quality at several levels)
• Management of the Data
-Standardization of the databases
-Public access to the databases• Information from the Data
-Need for data mining algorithms customized for gene expression analysis
Microarray Databases• Need public repository with standardized annotation
Issues :- difficulty in describing expression experiments;
remember that measurements are relative (complicates comparisons)
– Structure of the database itself
– Internet-based tools for searching and using semantic context to allow comparisons
Public Microarry Repositories
4 Major Efforts:
GeneX at US National Center for Genome Resources http://www.ncgr.org/research/genex/
ArrayExpress at European Bioinformatics Institute
http://www.ebi.ac.uk/arrayexpress/
Public Repositories
• Stanford University Database
http://genome-www4.stanford.edu/MicroArray/SMD/index.html
Public Repositories
Gene Expression Omnibus at US National Center for Biotechnology Information
Example at: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM39
Mining of the Expression Databases
• A gene expression pattern derived from a single microarray experiment is simply a snapshot (one experimental sample vs reference)
• Usually want to understand a process or changes in expression over a collection of samples
gene expression profile
Example?
Mining of the Expression Databases
General Approaches
• Raw data from multiple experiments converted to a gene expression matrix
- Rows: Different genes
- Columns: Different samples
- Numerical values encoded by color
(red=positive green=negative blue=n.a.)
Typical approach Look for similarities (or differences) in patterns
e.g. Compare rows to find evidence for co-regulation of genes
1) Need ways to measure similarity (distance) among the objects being compared
2) Then, group together objects (genes or samples) with similar properties.
Cluster Analysis• Partitions biological samples into groups
based on their statistical behavior.
- Unsupervised Analysis
- Supervised Analysis: classification rules
Eisen et al.
http://www.pnas.org/cgi/content/full/95/25/14863
Success StoryGene Clustering Approach
• Yeast genome– Complete set of genes used to study diauxic
shift time course– Cluster analysis of data identified group of
genes with similar expression profiles– Upstream regulatory sites of these genes
compared to identify transcription factor binding sites
(see Brazma & Vilo reference)
Example-Sample Clustering
Classification of cancers– Comparing 2 acute leukemias (AML and ALL)
Biological/Clinical Problems:• Previously, no single reliable test to distinguish • Differ greatly in clinical course & response to
treatments.
http://waldo.wi.mit.edu/MPR/figures_ALL_AML.html
Analytic Approach:1) Class discovery = classification by
clustering of microarray data using tumors of known type
Found 1100 of 6817 genes correlated with class distinction
2) Formation of a class predictor = 50 most informative genes
class discovery of unknown tumors
Analytic Approaches
Limitation of cluster analysis: similarity in expression pattern suggests co-regulation but doesn’t reveal cause-effect relationships
• Bayesian Networks– Represent the dependence structure between multiple
interacting quantities (e.g. expression levels of genes)
– gene interactions & models of causal influence
• Others? many
Check the Web: Free Software Available
Some useful links:
• Expression Profiler
http://ep.ebi.ac.uk/
• GeneX (NCGR)
http://genex.sourceforge.net/
www.ncgr.org/research/genex/other_tools.html
• http://www.kdnuggets.com/software/suites.html
Additional References:
• R. Ekins and F.W. Chu :Microarrays: their origins and applications. Trends in Biotechnology, 17: 217-218, 1999.
• Brazma et al., One-stop shop for microarray data. Nature 403: 699 – 700, 2000.
• Brazma A. and Vilo, J:Minireview. Gene expression data analysis. FEBS Letters, 480:17-24, 2000.