probe-level data normalisation: rma and gc-rma sam robson images courtesy of neil ward, european...
TRANSCRIPT
![Page 1: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/1.jpg)
Probe-Level Data Normalisation: RMA
and GC-RMA
Sam Robson
Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
![Page 2: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/2.jpg)
References
“Summaries of Affymetrix Genechip Probe Level Data”, Irizarry et al., Nucleic Acids Research, 2003, Vol. 31, No. 4.
“Exploration, Normalization and Summaries of High Density Oligonucleotide Array Probe Level Data”, Irizarry et al., …
“A Model Based Background Adjustment for Oligonucleotide Expression Arrays”, Wu, Irizarry et al., Johns Hopkins University, Dept. of Biostatistics
![Page 3: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/3.jpg)
Affymetrix Genechips
Each gene represented by 11-20 ‘probe pairs’. Probe pairs are 3’ biased. ‘Probe Pair’ consists of Perfect Match (PM) and
MisMatch (MM) probes. MM has altered middle (13th) base. Designed to measure
non-specific binding (NSB).
![Page 4: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/4.jpg)
Genechip Scanning
RNA sample prepared, labelled and hybridised to chip.
Chip fluorescently scanned. Gives a raw pixelated image - .DAT file.
Grid used to separate pixels related to individual probes.
Pixel intensities averaged to give single intensity for each probe - .CEL file.
Probe level intensities combined for each probe set to give single intensity value for each gene.
![Page 5: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/5.jpg)
Affymetrix MicroArray Suite (MAS) v4.0
Uses MM probes to correct for NSB. MAS4.0 used simple Average Difference method:
A is the subset of probes where is within 3 SDs of the average of
Excludes outliers, but not a robust averaging method.
Aj
jj MMPMA
AvDiff1
jjj MMPMd
12 ,..., Jdd
![Page 6: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/6.jpg)
Affymetrix MicroArray Suite (MAS) v5.0
Current method employed by Affymetrix. Weighted mean using one-step Tukey Biweight Estimate:
CTj is a quantity derived from MMj never larger than PMj.
Weights each probe intensity based on it’s distance from the mean.
Robust average (insensitive to small changes from any assumptions made).
jj CTPMBiweightTukeysignal loglog 1
![Page 7: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/7.jpg)
Tukey Biweight
![Page 8: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/8.jpg)
Problems with Mis-Match data
MM intensity levels are greater than PM intensity levels in ~1/3 of all probes.
Suggests that MM probes measure actual signal, and not just NSB.
Removal of MM results in negative signal values.
Subtracting MM data will result in loss of interesting signal in many probes. Several methods have been proposed using only PM data.
![Page 9: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/9.jpg)
Problems with Mis-Match data
![Page 10: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/10.jpg)
Problems with MAS5.0
Loss of probe-level information.
Background estimate may cause noise at low intensity levels due to subtraction of MM data.
![Page 11: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/11.jpg)
Robust Multiarray Average (RMA)
Subtraction of MM data corrects for NSB, but introduces noise.
Want a method that gives positive intensity values.
Normalising at probe level avoids the loss of information.
![Page 12: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/12.jpg)
Robust Multiarray Average (RMA)
1) Background correction.2) Normalization (across arrays).
3) Probe level intensity calculation.
4) Probe set summarization.
![Page 13: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/13.jpg)
Robust Multiarray Average (RMA)
PM data is combination of background and signal.
Assume strictly positive distribution for signal. Then background corrected signal is also positively distributed.
Background correction performed on each array seperately.
ijnijnijn PMSPMB
![Page 14: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/14.jpg)
Robust Multiarray Average (RMA)
1) Background correction.
2) Normalization (across arrays).3) Probe level intensity calculation.
4) Probe set summarization.
![Page 15: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/15.jpg)
Robust Multiarray Average (RMA)
Normalises across all arrays to make all distributions the same.
‘Quantile Normalization’ used to correct for array biases.
Compares expression levels between arrays for various quantiles.
Can view this on quantile-quantile plot.Protects against outliers.
![Page 16: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/16.jpg)
Robust Multiarray Average (RMA)
1) Background correction.
2) Normalization (across arrays).
3) Probe level intensity calculation.4) Probe set summarization.
![Page 17: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/17.jpg)
Robust Multiarray Average (RMA)
Linear model. Uses background corrected, normalised, log
transformed probe intensities (Yijn).
μin = Log scale expression level (RMA measure).
αjn = Probe affinity affect.εijn = Independent identically distributed error term
(with mean 0).
ijnjninijnY
![Page 18: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/18.jpg)
Robust Multiarray Average (RMA)
1) Background correction.
2) Normalization (across arrays).
3) Probe level intensity calculation.
4) Probe set summarization.
![Page 19: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/19.jpg)
Robust Multiarray Average (RMA)
Combine intensity values from the probes in the probe set to get a single intensity value for each gene.
Uses ‘Median Polishing’.Each chip normalised to its median.Each gene normalised to its median.Repeated until medians converge.Maximum of 5 iterations to prevent infinate
loops.
![Page 20: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/20.jpg)
Robust Multiarray Average (RMA)
Pre-Normalisation
![Page 21: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/21.jpg)
Robust Multiarray Average (RMA)
Post-Normalisation
![Page 22: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/22.jpg)
Corrects for background noise as well as NSB. Probe affinity calculated using position
dependant base effects:
MM data adjusted based on probe affinity, then subtracted from PM.
Does not lose MM data.
GC-RMA
![Page 23: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/23.jpg)
Advantages of RMA/GC-RMA
Gives less false positives than MAS5.0.See less variance at lower expression
levels than MAS5.0.Provides more consistent fold change
estimates.Exclusion of MM data in RMA reduces
noise, but loses information.Inclusion of adjusted MM data in GC-RMA
reduces noise, and retains MM data.
![Page 24: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/24.jpg)
Disadvantages of RMA/GC-RMA
May hide real changes, especially at low expression levels (false negatives).
Makes quality control after normalisation difficult.
Normalisation assumes equal distribution which may hide biological changes.
![Page 25: Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e995503460f94b9c4fd/html5/thumbnails/25.jpg)
Conclusions
RMA is more precise than MAS5.0, but may result in false negatives at low expression levels.
Useful for fold change analysis, but not for studying statistical significance. Makes quality control difficult.
Ideal solution – Use standard MAS5.0 techniques for quality control. Then go back and perform probe level normalisation on quality controlled genes.