introduction to dna microarrays
DESCRIPTION
Introduction to DNA Microarrays. Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of Biological Complexity [email protected] 225-4054. Biological Regulation: “You are what you express”. Levels of regulation Methods of measurement - PowerPoint PPT PresentationTRANSCRIPT
Introduction to DNA Microarrays
Michael F. Miles, M.D., Ph.D.
Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of
Biological Complexity
225-4054
Biological Regulation: “You are what you express”
• Levels of regulation
• Methods of measurement
• Concept of genomics
Regulation of Gene Expression
• Transcriptional– Altered DNA binding protein complex abundance or function
• Post-transcriptional– mRNA stability– mRNA processing (alternative splicing)
• Translational– RNA trafficking– RNA binding proteins
• Post-translational– Many forms!
Regulation of Gene Expression
• Genes are expressed when they are transcribed into
RNA
• Amount of mRNA indicates gene activity
• Some genes expressed in all tissues -- but are still
regulated!
• Some genes expressed selectively depending on
tissue, disease, environment
• Dynamic regulation of gene expression allows long
term responses to environment
Acute Drug Use
Mesolimbic dopamine? Other
ReinforcementIntoxication
Chronic Drug Use
Compulsive Drug Use
“Addiction”
?Synaptic RemodelingPersistent Gene Exp.
ToleranceDependence
Sensitization
Altered SignalingGene Expression
?Synaptic Remodeling
Progress in Studies on Gene Regulation
1960 1970 1980 1990 2000
mRNA,tRNA discovered
Nucleic acid hybridization, protein/RNA
electrophoresisMolecular cloning;
Southern, Northern & Western blots; 2-D
gelsSubtractive
Hybridization, PCR, Differential Display,
MALDI/TOF MS
Genome Sequencing
DNA/Protein Microarrays
Nucleic Acid Hybridization: How It Works
Primer on Nucleic Acid Hybridization
• Hybridization rate depends on time,the concentration of nucleic acids, and the reassociation constant for the nucleic acid:
C/Co = 1/(1+kCot)
Biological Networks
Types of Biological Networks
Gene Regulation Network
Examining Biological Networks: Experimental Design
Examining Biological Networks
A Bit of History
~1992-1996: Oligo arrays developed by Fodor, Stryer, Lockhart, others at Stanford/Affymetrix and Southern in Great Britain
~1994-1995: cDNA arrays usually attributed to Pat Brown and Dari Shalon at Stanford who first used a robot to print the arrays. In 1994, Shalon started Synteni which was bought by Incyte in 1998.
However, in 1982 Augenlicht and Korbin proposed a DNA array (Cancer Research) and in 1984 they made a 4000 element array to interrogate human cancer cells.
(Rejected by Science, Nature and the NIH)
High Density DNA Microarrays
Expression Profiling: A Non-biased, Genomic Approach to Understanding Complex CNS Disease
Candidate Gene Studies
Molecular Triangulation:
Genomics, Genetics and Pharmacology
Bioinformatics:Genetical genomicsFunctional GroupingLiterature NetworksProtein Interactions
Promotor Motif Grouping
Utility of Expression Profiling
• Non-biased, genome-wide• Hypothesis generating • Gene hunting• Pattern identification:
– Insight into gene function– Molecular classification– Phenotypic mechanisms
PFCHIP VTA
NAC
Use of S-score in Hierarchical Clustering of Brain Regional Expression Patterns
0 +2-2
relative change
PFCHIP NAC
VTA
AvgDiff S-score
Experimental Design with DNA Microarrays
Type of Variance FactorsBiological Animal-animal differences (intra/inter cage, supplier)
Genotype
Circadian rhythms
Stress
Technical Sample treatment/harvesting (dissections, injections)
Target preparation (enzyme lots, mRNA quality)
Lot-to-lot chip variation
Chip processing (scanning order)
Environmental Temperature
Handling
Noise/odors
Sources of Variance in Microarray Experiments
High Density DNA Microarrays
Synthesis and Analysis of 2-color Spotted cDNA Arrays: “Brown
Chips”
Comparative Hybridization with Spotted cDNA Microarrays
Synthesis of High Density Oligonucleotide Arrays by Photolithography/Photochemistry
GeneChip Features
• Parallel analysis of >30K human, rat or mouse genes/EST clusters with 15-20 oligos (25 mer) per gene/EST
• entire genome analysis (human, yeast, mouse)
• 3-4 orders of magnitude dynamic range (1-10,000 copies/cell)
• quantitative for changes >25% ??• SNP analysis
Oligonucleotide Array Analysis
AAAA
Oligo(dT)-T7
Total RNA Rtase/Pol II
dsDNAAAAA-T7TTTT-T7
CTP-biotin
T7 polTTTT-5’5’
Biotin-cRNA
Hybridization
Steptavidin-phycoerythrin
Scanning
PM
MM
Stepwise Analysis of Microarray Data
• Low-level analysis -- image analysis, expression quantitation
• Primary analysis -- is there a change in expression?
• Secondary analysis -- what genes show correlated patterns of expression? (supervised vs. unsupervised)
• Tertiary analysis -- is there a phenotypic “trace” for a given expression pattern?
Affymetrix Arrays: Image Analysis
Affymetrix Arrays: Image Analysis
“.DAT” file “.CEL” file
Affymetrix Arrays: PM-MM Difference Calculation
Probe pairs control for non-specific hybridization of oligonucleotides
(a)
Variability in Ln(FC)
- 4
- 3
- 2
- 1
0
1
2
3
4
- 4 - 3 - 2 - 1 0 1 2 3 4
l n ( P F C 1 A S / V T A 1 A S )
R = 0 . 7 1
ln(FoldChange) S-score
Ln(FC1)
Ln(FC2)
Probe Level Analysis Methods
• AvgDiff -- Affymetrix 1996, trimmed mean with exclusion of outliers, PM-MM
• MAS 5 -- Affymetrix 2001, modeled correction of MM, Tukey’s bi-weight, PM-MM or PM-m
• MBEI -- Li and Wong 2001, modeled correction and outlier detection, PM-MM or PM only
• RMA (Robust Multichip Analysis) -- Irizarry et al. 2002,
PM only• PDNN (Position Dependent Nearest Neighbor) - Zhang et
al. 2003, thermodynamic model for probe interactions, PM only
“Lowess” normalization,Pin-specific Profiles
After Print-tip Normalization
Slide Normalization: Pieces and Pins
See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.pdf
Normalization Confounds: Non-linearity
Normal vs. NormalNormal vs. Normal
Normal vs. TumorNormal vs. Tumor
Statistical Analysis of Microarrays: “Not Your Father’s Oldsmobile”
Secondary Analysis: Expression Patterns
• Supervised multivariate analyses– Support vector machines
• Non-supervised clustering methods– Hierarchical– K-means– SOM
Clustering Methods
• Distance measurement -- Euclidean most frequently used (d2 = (xi-yi)2)
• Clustering techniques• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods– Hierarchical -- single vs. complete vs. average linkage– K-means -- have to estimate “k” initially– SOM -- self-organizing maps– Principal components analysis
K-means vs. Hierarchical Clustering
• K-means: select number of groups, divide genes randomly into those groups, calculate inter- and intra-group distances. Move genes until maximize inter-group and minimize intra-group differences.
• Hierarchical: calculate all pairwise distances (correlations) and order genes accordingly.
PFCHIP VTA
NAC
Use of S-score in Hierarchical Clustering of Brain Regional Expression Patterns
0 +2-2
relative change
PFCHIP NAC
VTA
AvgDiff
S-score
Expression Profiling:
“It is possible that the expression profile could serve as a universal phenotype … Using a comprehensive database of reference profiles, the pathway(s) perturbed by an uncharacterized mutation would be ascertained by simply asking which expression patterns in the database its profile most strongly resembles … it should be equally effective at determining consequences of pharmaceutical treatments and disease states”
Hughes et al. Cell 102:109-126 (2000)
Use of Expression Profile “Compendium” to Characterize Gene or Drug Function
Hughes et al. Cell 102:109-126 (2000)
established error modelprofiled large number of mutants/drugs under highly controlled conditionsstatistical treatment of expression patternsverified array results with biochemical/phenotypic assays
Key features:
Correlation in Expression Profiles of Drugs/Genes Affecting Same
Pathways
cup5 and vma8, components of
H+/ATPase complex
Unrelated gene
mutants HMG CoA-
reductase mutant vs. lovastatin, an
inhibitor of HMG2
Red symbols = significant change (p<0.05) in both treatmentsHughes et al. Cell 102:109-126 (2000)
Assigning Function to Uncharacterized Genes by Expression Profiles
Hughes et al. Cell 102:109-126 (2000)
Tertiary Analysis: Connecting Function with Expression Patterns
• Annotation– UniGene/Swiss-Prot, SOURCE, DAVID
• Biased functional assessment– Manual, GenMAPP, GeneSpring
• Non-biased functional queries– PubGen– MAPPFinder, DAVID/Ease, GEPAS,
GOTree Machine, others• Overlaying genomics and genetics
– WebQTL
Non-biased (semi) Functional Group Analysis:
GenMAPP
Expression Analysis Systematic Explorer -- EASE
http://apps1.niaid.nih.gov/david/upload.jsp
Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.
EASE -- Options in Analysis
Efforts to Integrate Diverse Biological Databases
with Expression Information: PubGen
www.PubGen.org
Expression Networks
Expression Profiling
Pharmacology Genetics
Complex
Trait
Prot-Prot
Interactions
OntologyHomolo-Gene
BioMed Lit
Relations
Quaternary Analysis: Profiles to Physiology
Analysis Stages for Oligonucleotide Microarrays
Analysis Stage Description Examples of MethodsNormalization Equalizes overall signal across
arrays to be compared, ensureslinearity of response acrossabundance classes
Whole chip(26)Quantile(27)
Probe reduction Combines signals from multipleprobes or probe pairs to define“expression level”. Identifiesgenes with invalid or hyper-variable expression levels.
Weighted average (MAS 4)(29)Tukey bi-weight (MAS 5)(30)Model-based (MBEI)(31)Log scale linear additive (RMA)(32)Position-dependent stacking energy modeling(PDNN) (33)
Comparative Compares expression of a geneacross two or more arrays todetermine significant changes inexpression
t-testrank order (MAS 5) (30)permutation (SAM) (46, 47)S-score (48)
Multivariatestudies
Identifies significant correlationsin expression data acrossexperiments/conditions
hierarchical clusteringk-means clusteringself-organizing mapsprinciple components analysis& many more(34, 49)
Biological overlay Identify functions for givengenes, clusters of genes;hypothesis generation
Multiple database access (Source)(50)PubMed correlations (PubGene)(51)Gene Ontology rankings (GenMAPP,MAPPFinder, DAVID/EASE)(52, 53)
Bioinformatics Resources for Microarray Experiments
Name Description Link
SOURCE Human, rat, mouse gene compilationfrom multiple databases; allows batchsubmissions for annotation
http://source.stanford.edu/cgi-bin/sourceSearch
GeneLynx Human, mouse gene compilation;multiple database links regardinggene/protein structure and function
http://www.genelynx.org/
DAVID/Ease Mines gene list for frequency of GOcategories; annotation of gene list;statistical analysis of biological themesin gene list (EASE)
http://apps1.niaid.nih.gov/David/upload.asp
GenMAPP/MAPPFinder Superimposes array data on biologicalpathways; statistical ranking offunctional groups
http://www.genmapp.org/
FatiGO Mines gene list for occurrence of GOterms; statistical comparison of twolists for over-representation
http://fatigo.bioinfo.cnio.es/
PubGene Finds associations between genes inbiomedical literature; superimposesarray data on literature links;commercial version available
http://www.pubgene.org/
MEME Search promoter regions of genes inlist/cluster for conserved motifs
http://meme.sdsc.edu/meme/website/intro.html