two color microarrays

1

Two Color Microarrays

EPP 245

Statistical Analysis of

Laboratory Data

November 15, 2007 EPP 245 Statistical Analysis of Laboratory Data

2

Two-Color Arrays

• Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye.

• If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability


3

Dyes

• The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm)

• The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red)

• The emissions are read via filters using a CCD device


4


5


6


7

File Format

• A slide scanned with Axon GenePix produces a file with extension .gpr that contains the results:http://www.axon.com/gn_GenePix_File_Formats.html

• This contains 29 rows of headers followed by 43 columns of data (in our example files)

• For full analysis one may also need a .gal file that describes the layout of the arrays

http://www.axon.com/gn_GenePix_File_Formats.html


8

"Block"

"Column"

"Row"

"Name"

"ID"

"X"

"Y"

"Dia."

"F635 Median"

"F635 Mean"

"F635 SD"

"B635 Median"

"B635 Mean"

"B635 SD"

"% > B635+1SD"

"% > B635+2SD"

"F635 % Sat."

"F532 Median"

"F532 Mean"

"F532 SD"

"B532 Median"

"B532 Mean"

"B532 SD"

"% > B532+1SD"

"% > B532+2SD"

"F532 % Sat."

"Ratio of Medians (635/532)"

"Ratio of Means (635/532)"

"Median of Ratios (635/532)"

"Mean of Ratios (635/532)"

"Ratios SD (635/532)"

"Rgn Ratio (635/532)"

"Rgn R² (635/532)"

"F Pixels"

"B Pixels"

"Sum of Medians"

"Sum of Means"

"Log Ratio (635/532)"

"F635 Median - B635"

"F532 Median - B532"

"F635 Mean - B635"

"F532 Mean - B532"

"Flags"


9

Analysis Choices

• Mean or median foreground intensity

• Background corrected or not

• Log transform (base 2, e, or 10) or glog transform

• Log is compatible only with no background correction

• Glog is best with background correction


10

Array normalization

• Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays

• Without normalization, the analysis would be valid, but possibly less sensitive

• However, a poor normalization method will be worse than none at all.


11

Possible normalization methods

• We can equalize the mean or median intensity by adding or multiplying a correction term

• We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles

• We can normalize for other things such as print tips


12

Group 1 Group 2

Array 1 Array 2 Array 3 Array 4

Gene 1 1100 900 425 550

Gene 2 110 95 85 110

Gene 3 80 65 55 80

Example for Normalization


13

> normex <- matrix(c(1100,110,80,900,95,65,425,85,55,550,110,80),ncol=4)> normex [,1] [,2] [,3] [,4][1,] 1100 900 425 550[2,] 110 95 85 110[3,] 80 65 55 80> group <- as.factor(c(1,1,2,2))

> anova(lm(normex[1,] ~ group))Analysis of Variance Table

Response: normex[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 262656 262656 18.888 0.04908 *Residuals 2 27812 13906 ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1


14


Response: normex[2, ] Df Sum Sq Mean Sq F value Pr(>F)group 1 25.0 25.0 0.1176 0.7643Residuals 2 425.0 212.5


Response: normex[3, ] Df Sum Sq Mean Sq F value Pr(>F)group 1 25.0 25.0 0.1176 0.7643Residuals 2 425.0 212.5


15

Group 1 Group 2


Gene 1 975 851 541 608

Gene 2 -15 46 201 168

Gene 3 -45 16 171 138

Additive Normalization by Means


16

> cmn <- apply(normex,2,mean)> cmn[1] 430.0000 353.3333 188.3333 246.6667

> mn <- mean(cmn)> normex - rbind(cmn,cmn,cmn)+mn [,1] [,2] [,3] [,4]cmn 974.58333 851.25 541.25 607.9167cmn -15.41667 46.25 201.25 167.9167cmn -45.41667 16.25 171.25 137.9167> normex.1 <- normex - rbind(cmn,cmn,cmn)+mn


17

> anova(lm(normex.1[1,] ~ group))Analysis of Variance Table

Response: normex.1[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 114469 114469 23.295 0.04035 *Residuals 2 9828 4914 > anova(lm(normex.1[2,] ~ group))Analysis of Variance Table

Response: normex.1[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 28617.4 28617.4 23.295 0.04035 *Residuals 2 2456.9 1228.5 > anova(lm(normex.1[3,] ~ group))Analysis of Variance Table

Response: normex.1[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 28617.4 28617.4 23.295 0.04035 *Residuals 2 2456.9 1228.5


18

Group 1 Group 2


Gene 1 779 776 687 679

Gene 2 78 82 137 136

Gene 3 57 56 89 99

Multiplicative Normalization by Means


19

> normex*mn/rbind(cmn,cmn,cmn) [,1] [,2] [,3] [,4]cmn 779.16667 775.82547 687.33407 679.13851cmn 77.91667 81.89269 137.46681 135.82770cmn 56.66667 56.03184 88.94912 98.78378> normex.2 <- normex*mn/rbind(cmn,cmn,cmn)> anova(lm(normex.2[1,] ~ group))

Response: normex.2[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 8884.9 8884.9 453.71 0.002197 **Residuals 2 39.2 19.6 > anova(lm(normex.2[2,] ~ group))

Response: normex.2[2, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 3219.7 3219.7 696.33 0.001433 **Residuals 2 9.2 4.6 > anova(lm(normex.2[3,] ~ group))

Response: normex.2[3, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 1407.54 1407.54 57.969 0.01682 *Residuals 2 48.56 24.28


20

Group 1 Group 2


Gene 1 1000 947 500 500

Gene 2 100 100 100 100

Gene 3 73 68 65 73

Multiplicative Normalization by Medians


21

> cmd <- apply(normex,2,median)> cmd[1] 110 95 85 110> normex.3 <- normex*md/rbind(cmd,cmd,cmd)> normex.3 [,1] [,2] [,3] [,4]cmd 1000.00000 947.36842 500.00000 500.00000cmd 100.00000 100.00000 100.00000 100.00000cmd 72.72727 68.42105 64.70588 72.72727> anova(lm(normex.3[1,] ~ group))

Response: normex.3[1, ] Df Sum Sq Mean Sq F value Pr(>F) group 1 224377 224377 324 0.003072 **Residuals 2 1385 693 > anova(lm(normex.3[2,] ~ group))

Response: normex.3[2, ] Df Sum Sq Mean Sq F value Pr(>F)group 1 0 0 Residuals 2 0 0 > anova(lm(normex.3[3,] ~ group))

Response: normex.3[3, ] Df Sum Sq Mean Sq F value Pr(>F)group 1 3.451 3.451 0.1665 0.7228Residuals 2 41.443 20.722


22

Intensity-based normalization

• Normalize by means, medians, etc., but do so only in groups of genes with similar expression levels.

• lowess is a procedure that produces a running estimate of the middle, like a robustified mean

• If we subtract the lowess of each array and add the average of the lowess’s, we get the lowess normalization


23

norm <- function(mat1){ mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] cmean <- apply(mat2,2,mean) cmean <- cmean - mean(cmean) mnmat <- matrix(rep(cmean,p),byrow=T,ncol=n) return(mat2-mnmat)}


24

lnorm <- function(mat1,span=.1){ mat2 <- as.matrix(mat1) p <- dim(mat2)[1] n <- dim(mat2)[2] rmeans <- apply(mat2,1,mean) rranks <- rank(rmeans,ties.method="first") matsort <- mat2[order(rranks),] r0 <- 1:p lcol <- function(x) { lx <- lowess(r0,x,f=span)$y } lmeans <- apply(matsort,2,lcol) lgrand <- apply(lmeans,1,mean) lgrand <- matrix(rep(lgrand,n),byrow=F,ncol=n) matnorm0 <- matsort-lmeans+lgrand matnorm1 <- matnorm0[rranks,] return(matnorm1)}


25


26


27

two color microarrays

Documents