preprocessing of cdna microarray data lecture 19, statistics 246, april 1, 2004
TRANSCRIPT
![Page 1: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/1.jpg)
Preprocessing of cDNA microarray dataPreprocessing of cDNA microarray data
Lecture 19, Statistics 246,
April 1, 2004
![Page 2: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/2.jpg)
Was the experiment a success?
What analysis tools should be used?
Are there any specific problems?
Begin by looking at the dataBegin by looking at the data
![Page 3: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/3.jpg)
Red/Green overlay images
Good: low bg, lots of d.e. Bad: high bg, ghost spots, little d.e.
Co-registration and overlay offers a quick visualization,revealing information on color balance, uniformity ofhybridization, spot uniformity, background, and artifiactssuch as dust or scratches
![Page 4: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/4.jpg)
Always log, always rotate
log2R vs log2G M=log2R/G vs A=log2√RG
![Page 5: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/5.jpg)
Signal/Noise = log2(spot intensity/background intensity)
Histograms
![Page 6: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/6.jpg)
Boxplots of log2R/G
Liver samples from 16 mice: 8 WT, 8 ApoAI KO.
![Page 7: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/7.jpg)
Spatial plots: background from the two slides
![Page 8: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/8.jpg)
Highlighting extreme log ratios
Top (black) and bottom (green) 5% of log ratios
![Page 9: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/9.jpg)
Boxplots and highlightingBoxplots and highlighting
Clear example of spatial bias (here high is red, low green)
Print-tip groups
Lo
g-r
ati o
s
pin group #
![Page 10: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/10.jpg)
Pin group (sub-array) effects
Boxplots of log ratios by pin groupLowess lines through points from pin groups
![Page 11: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/11.jpg)
Plate effects
![Page 12: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/12.jpg)
KO #8
Probes: ~6,000 cDNAs, including 200 related to lipid metabolism. Arranged in a 4x4 array of 19x21 sub-arrays.
![Page 13: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/13.jpg)
Time of printing effects
Green channel intensities (log2G). Printing over 4.5 days.The previous slide depicts a slide from this print run.
spot number
![Page 14: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/14.jpg)
NormalizationNormalization
Why? To correct for systematic differences between
samples on the same slide, or between slides, which do not represent true biological variation between samples.
How do we know it is necessary? By examining self-self hybridizations, where no
true differential expression is occurring.
We find dye biases which vary with overall spot intensity, location on the array, plate origin, pins, scanning parameters,….
![Page 15: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/15.jpg)
Self-self hybridizations
False color overlay Boxplots within pin-groups Scatter (MA-)plots
![Page 16: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/16.jpg)
From the NCI60 data set (Stanford web site)
A series of non self-self hybridizations
![Page 17: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/17.jpg)
Early Ngai lab, UC Berkeley
![Page 18: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/18.jpg)
Early Goodman lab, UC Berkeley
![Page 19: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/19.jpg)
From the Ernest Gallo Clinic & Research Center
![Page 20: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/20.jpg)
Early PMCRI, Melbourne Australia
![Page 21: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/21.jpg)
Normalization: methodsNormalization: methods a) Normalization based on a global adjustment
log2 R/G -> log2 R/G - c = log2 R/(kG)
Choices for k or c = log2k are c = median or mean of log ratios for a particular gene set (e.g. housekeeping genes). Or, total intensity normalization, where k = ∑Ri/ ∑Gi.
b) Intensity-dependent normalization. Here we run a line through the middle of the MA plot, shifting the M
value of the pair (A,M) by c=c(A), i.e.
log2 R/G -> log2 R/G - c (A) = log2 R/(k(A)G). One estimate of c(A) is made using the LOWESS function of
Cleveland (1979): LOcally WEighted Scatterplot Smoothing.
![Page 22: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/22.jpg)
Normalization: methodsNormalization: methods
c) Within print-tip group normalization. In addition to intensity-dependent variation in log ratios, spatial bias
can also be a significant source of systematic error. Most normalization methods do not correct for spatial effects
produced by hybridization artifacts or print-tip or plate effects during the construction of the microarrays.
It is possible to correct for both print-tip and intensity-dependent bias by performing LOWESS fits to the data within print-tip groups, i.e.
log2 R/G -> log2 R/G - ci(A) = log2 R/(ki(A)G),
where ci(A) is the LOWESS fit to the MA-plot for the ith grid only.
![Page 23: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/23.jpg)
Which spots to use for normalization?Which spots to use for normalization?
The LOWESS lines can be run through many different sets of points, and each strategy has its own implicit set of assumptions justifying its applicability.
For example, we can justify the use of a global LOWESS approach by
supposing that, when stratified by mRNA abundance, a) only a minority of genes are expected to be differentially expressed, or b) any differential expression is as likely to be up-regulation as down-regulation.
Pin-group LOWESS requires stronger assumptions: that one of the
above applies within each pin-group. The use of other sets of genes, e.g. control or housekeeping genes,
involve similar assumptions.
![Page 24: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/24.jpg)
Use of control spotsUse of control spots
M = log R/G = logR - logG A = ( logR + logG) /2
Positive controls
(spotted in varying concentrations) Negative controls
blanks
Lowess curve
![Page 25: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/25.jpg)
Global scale, global lowess, pin-group lowess; spatial plot after, smooth histograms of M after
![Page 26: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/26.jpg)
MSP titration seriesMSP titration series((Microarray Sample PoolMicroarray Sample Pool))
Control set to aid intensity- dependent normalization
Different concentrations
Spotted evenly spread across the slide
Pool the whole library
![Page 27: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/27.jpg)
Yellow: GAPDH, tubulin Light blue: MSP pool / titration
Orange: Schadt-Wong rank invariant set Red line: lowess smooth
MSP normalization compared to other methods
![Page 28: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/28.jpg)
Composite normalizationComposite normalization
Before and after composite normalization
-MSP lowess curve-Global lowess curve-Composite lowess curve(Other colours control spots)
ci(A)=Ag(A)+(1-A)fi(A)
![Page 29: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/29.jpg)
Comparison of Normalization SchemesComparison of Normalization Schemes(courtesy of Jason Goncalves)(courtesy of Jason Goncalves)
No consensus on best segmentation or normalization method
Scheme was applied to assess the common normalization
methods Based on reciprocal labeling experiment data for a series of
140 replicate experiments on two different arrays each with 19,200 spots
![Page 30: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/30.jpg)
DESIGN OF RECIPROCALDESIGN OF RECIPROCALLABELING EXPERIMENTLABELING EXPERIMENT
Replicate experiment in which we assess the same mRNA pools but invert the fluors used.
The replicates are independent experiments and are scanned, quantified and normalized as usual
![Page 31: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/31.jpg)
2.2/1
21.2/1
2 )(log)(log ExpChCh
GeneAExpChCh
GeneA RatioRatio −=
The following relationship would be observedfor reciprocal microarray experiments in
which the slides are free of defects and the normalization scheme performed ideally
We can measure using real data sets how well each microarray normalization scheme
approaches this ideal
![Page 32: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/32.jpg)
2.2/1
21.2/1
2 )(log)(log ExpChCh
GeneAExpChCh
GeneASpot RatioRatioDeviation +=
n
RatioRatioDeviation
ExpChCh
GeneNExpChCh
GeneN
n
geArrayAvera
2.2/1
21.2/1
21
)(log)(log +=∑
Deviation metric to assessDeviation metric to assessnormalization schemesnormalization schemes
We now use the mean array average deviation to compare the We now use the mean array average deviation to compare the normalization methods. Note that this comparison addresses only normalization methods. Note that this comparison addresses only variance (precision) and not bias (accuracy) aspects of variance (precision) and not bias (accuracy) aspects of normalization.normalization.
![Page 33: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/33.jpg)
***
![Page 34: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/34.jpg)
Scale normalization: between slides
Boxplots of log ratios from 3 replicate self-self hybridizations.Left panel: before normalizationMiddle panel: after within print-tip group normalizationRight panel: after a further between-slide scale normalization.
![Page 35: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/35.jpg)
The “NCI 60” experiments (no bg)
Some scale normalization seems desirable
![Page 36: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/36.jpg)
Scale normalization: another data set
Lo
g-r
ati o
s
Only small differences in spread apparent. No action required.
`
![Page 37: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/37.jpg)
Assumption: All slides have the same spread in M
True log ratio is ij where i represents different slides and j represents different spots.
Observed is Mij, where
Mij = ai ij
Robust estimate of ai is
MADi = medianj { |yij - median(yij) | }
One way of taking scale into accountOne way of taking scale into account
MADi
MADii=1
I∏I
![Page 38: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/38.jpg)
A slightly harder normalization problemA slightly harder normalization problem
Global lowess doesn’t do the trick here.
![Page 39: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/39.jpg)
Print-tip-group normalization helpsPrint-tip-group normalization helps
![Page 40: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/40.jpg)
But not completelyBut not completely
There is still a lot of scatter in the middle in a WT vs KO comparison.
![Page 41: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/41.jpg)
Effects of previous normalisationEffects of previous normalisation
Before normalisation After print-tip-groupnormalization
![Page 42: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/42.jpg)
Within print-tip-group box plots of M afterWithin print-tip-group box plots of M afterprint-tip-group normalizationprint-tip-group normalization
![Page 43: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/43.jpg)
Assumption:
All print-tip-groups have the same spread in M
True log ratio is ij where i represents different print-tip-groups and j represents different spots.
Observed is Mij, where
Mij = ai ij
Robust estimate of ai is
MADi = medianj { |yij - median(yij) | }
Taking scale into account, cont.Taking scale into account, cont.
MADi
MADii=1
I∏I
![Page 44: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/44.jpg)
Effect of location & scale normalizationEffect of location & scale normalization
Clearly care is needed in making decisions like this one.
![Page 45: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/45.jpg)
A comparison of three MA-plots
Unnormalized Print-tip normalization Print tip & scale n.
![Page 46: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/46.jpg)
The same idea on another data setThe same idea on another data set
After print-tip location and scale normalization.
Lo
g-r
ati o
s
Print-tip groups
![Page 47: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/47.jpg)
Follow-up experiment
On each slide, half the spots (8) are differentially expressed, the other half are not.
![Page 48: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/48.jpg)
Paired-slides: dye-swapPaired-slides: dye-swap
Slide 1, M = log2 (R/G) - c
Slide 2, M’ = log2 (R’/G’) - c’
Combine by subtracting the normalized log-ratios:
[ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2
[ log2 (R/G) + log2 (G’/R’) ] / 2
[ log2 (RG’/GR’) ] / 2
provided c = c’.
Assumption: the normalization functions are the same for the two slides.
![Page 49: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/49.jpg)
Checking the assumptionChecking the assumption
MA plot for slides 1 and 2: it isn’t always like this.
![Page 50: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/50.jpg)
Result of self-normalizationResult of self-normalization
(M - M’)/2 vs. (A + A’)/2
![Page 51: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/51.jpg)
Summary of normalizationSummary of normalization
— Reduces systematic (not random) effects— Makes it possible to compare several arrays
— Use logratios (MA-plots)— Lowess normalization (dye bias)— MSP titration series – composite normalization— Pin-group location normalization— Pin-group scale normalization— Between slide scale normalization
— More? Use controls!— Normalization introduces more variability— Outliers (bad spots) are handled with replication
![Page 52: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/52.jpg)
What is missing?What is missing? Principally, a discussion of data quality issues. Most image analysis
programs collect a wide range of measurements associated with each spot: morphological measures such as area and perimeter (in pixels), uniformity measures such as the SD of foreground and background intensities in each channel, and of ratios of intensities (with and without background) across the pixels in a spot; and spot brightness indicators such as the ratio of spot foreground to spot background, and the fraction of pixels in the foreground with intensity greater than background intensity (or a given multiple thereof). From these, further derived measures can be calculated, such as coefficients of variation, and so on.
How should we make use of the various quality indicators? Most programs include procedures for flagging spots on the basis of one or more indicators, and users typically omit flagged spots from their primary analyses. “Data filtering” of this kind clearly improves the appearance of the data, but….can we do more? That is a longer story, for another time.
![Page 53: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/53.jpg)
AcknowledgmentsAcknowledgments
Jean Yee Hwa Yang (UCB)
Sandrine Dudoit (UCB)
Natalie Thorne (WEHI)
Ingrid Lönnstedt (Uppsala)
Henrik Bengtsson (Lund)
Jason Goncalves (Iobion)
Matt Callow (LLNL)
Percy Luu (UCB)
John Ngai (UCB)
Vivian Peng (UCB)
Dave Lin (Cornell)
![Page 54: Preprocessing of cDNA microarray data Lecture 19, Statistics 246, April 1, 2004](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649db95503460f94aa9a5a/html5/thumbnails/54.jpg)
Reference: Yang et al (2002) Nucleic Acids Research 30, e15.
Some web sites:
Technical reports, talks, software etc.http://www.stat.berkeley.edu/users/terry/zarray/Html/
Statistical software R (“GNU’s S”) http://www.R-project.org/
Packages within R environment:-- SMA (statistics for microarray analysis) http://www.stat.berkeley.edu
/users/terry/zarray/Software/smacode.html--Spot http://www.cmis.csiro.au/iap/spot.htm