bringing a statistical package to the biologist's fingertips
DESCRIPTION
TRANSCRIPT
![Page 1: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/1.jpg)
Bringing A Statistical Package To The
Biologist’s Fingertips
With Applications to Microarray Analysis
![Page 2: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/2.jpg)
Microarray ExperimentsSome examples of the many types of microarray
experiments currently being considered.• Comparison to normal cells.• Comparison of many cell types using an
appropriate pool of RNA as a reference.• Time series using either time 0 or past time as
a reference• Knockout experiments• Factor experiments
![Page 3: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/3.jpg)
Statistical issues to be addressed.Image analysis.• Spot identification• Background correction
Data analysis• Normalisation• Transformation• Significant genes• Large amounts of data• • ………………….Need a flexible approach.
![Page 4: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/4.jpg)
A tool for analysis : R
R is freeware that is rapidly becoming very widely used.
It can handle the large data files used to analyse microarrays.
Is available for Unix, Linux and Windows.
Has excellent documentation and help available.
![Page 5: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/5.jpg)
Image Analysis and R
In collaboration with the CSIRO (Sydney) , Jean Yee Hwa Yang and Terry Speed have developed a microarray image analysis package that is currently being written for implementation using Z-image and R.
This automated image analysis program overcomes some of the problems and limitations of other commercial packages.
Output will automatically be setup for further analysis in R.
![Page 6: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/6.jpg)
Using R at WEHI
Currently only available on unix02.
Access from a Macintosh is limited to command line window only. The graphics window can only be seen if an X-Windows program is installed on the Mac.
However, if there is a demand for use of R at WEHI then Computer Centre will investigate options to change this situation.
Install R windows on a PC or install R for linux.
![Page 7: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/7.jpg)
Using R at WEHI (2)
NAT>R
R : Copyright 2000, The R Development Core TeamVersion 1.0.0 (February 29, 2000)Type "demo()" for some demos, "help()" for on-line help, or "help.start()" for a HTML browser interface to help.
Type "q()" to quit R.>q()Save workspace image? [y/n/c]: y
NAT>R --vsize=50M --nsize=2000k
![Page 8: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/8.jpg)
How to make a vector
> x<-c(1,3,5,4,7,8)> x[1] 1 3 5 4 7 8
> t(x) [,1] [,2] [,3] [,4] [,5] [,6][1,] 1 3 5 4 7 8
> length(x)[1] 6
> index<-c(2,3,4)> x[index][1] 3 5 4>
![Page 9: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/9.jpg)
How to make a matrix
> xmat<-matrix(x,nrow=2,ncol=3,byrow=T)> xmat [,1] [,2] [,3][1,] 1 3 5[2,] 4 7 8
> xmat[1,2][1] 3> xmat[,3][1] 5 8
> xmat<-matrix(x,nrow=2,ncol=3,byrow=F)> xmat [,1] [,2] [,3][1,] 1 5 7[2,] 3 4 8
![Page 10: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/10.jpg)
Adding and removing a column
> addcol<-c(9,2)>> newxmat<-cbind(xmat,addcol)> newxmat addcol[1,] 1 5 7 9[2,] 3 4 8 2
> oldxmat<-newxmat[,-4]> oldxmat
[1,] 1 5 7[2,] 3 4 8
>
![Page 11: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/11.jpg)
A script to find mean of columns
> for( i in 1:3){+ print(mean(xmat[,i]))+ }> > 2.0 > 4.5 > 7.5 >
m<-0for( i in 1:3){m<-c(m,mean(xmat[,i]))}m<-m[-1]
for( i in 1:3){ print(mean(xmat[,i]))}
> dim(xmat)[1] 2 3> m<-0+ for( i in 1:3){+ m<-c(m,mean(xmat[,i]))+ }+ m<-m[,-1] >+ + + > > > > > > > > m[1] 2.0 4.5 7.5
![Page 12: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/12.jpg)
Reading in Datanum GR GC SR SC NAME X Y CH1ICH1B CH1ISD CH1BSD CH2I CH2B CH2ISDCH2BSD1 1 1 1 1 CL0001 1220.00 890.00 1223.317505 168.473679 435.35226437.599304 1014.603149 139.578949 446.61496021.9375782 1 1 1 2 CL0001 1400.00 890.00 1257.714233 233.368423 337.94632090.568703 975.333313 142.684204 354.19403122.9348183 1 1 1 3 CL0008 1580.00 890.00 333.555542 144.000000 145.99256915.944347 277.730164 126.842102 156.3145299.719757
![Page 13: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/13.jpg)
Reading in data from a text file
>#check that file has same number of arguments >#on each line for all lines>count.fields(file="tp04sk1.txt",sep="\t",skip=0)> . . . . . . . . . . . . . . . . . 16 16 16 16[9145] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9169] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9193] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9217] 16
>tp4sk1<- read.table("tp04sk1.txt", header=T, sep="\t", skip=0, row.names=1)> > >>attach(tp4sk1)> median(CH1I)[1] 375.627
![Page 14: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/14.jpg)
Getting spot info from the dataframe
> cy3 <- CH2I # Greency5 <- CH1I # Red>> cy3bc <- CH2I-CH2B # Background Corrected.cy5bc <- CH1I-CH1B
> # Get duplicates.> d1 <- seq(1,(dim(tp4sk1)[1]-1),2)d2 <- seq(2,(dim(tp4sk1)[1]),2)>> cy3d1 <- cy3bc[d1] cy3d2 <- cy3bc[d2]> cy5d1 <- cy5bc[d1]cy5d2 <- cy5bc[d2]>
![Page 15: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/15.jpg)
Always log the intensities
> > par(mfrow=c(2,3))hist(cy3,col="green")plot(density(cy3),col="green")plot(density(Cy3),col="green") # Use Log base 2 hist(cy5,col="red") plot(density(cy5),col="red")plot(density(Cy5),col="red")>>
![Page 16: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/16.jpg)
Normalisation
>>>>
> par(mfrow=c(2,1))plot(density(Cy3),type="n")lines(density(Cy3),col="green")lines(density(Cy5),col="red")plot(Cy3,Cy5,xlab="Log(cy3) Background Corrected",ylab="Log(cy5) Background Corrected",main="The Need For Normalisation Between Green and Red Intensities")lines(lowess(Cy3,Cy5),col="yellow")
![Page 17: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/17.jpg)
Normalisation (2)
>
>K <- median(
log2(cy3)-log2(cy5) )>>k <- 2**KCy5n <- k*cy5Cy5n <- log2(cy5n)
>
>
Green intensity is a multiple of the red intensity.cy3 <- k*cy5
So when you take logs,log2(cy3) <- K+log2(cy5)
Therefore, estimate K by the median difference of log intensities.
K <- median( Cy3 - Cy5 )k <- 2**(K)cy5n <- k*cy5Cy5n <- log2(cy5n)
![Page 18: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/18.jpg)
Approximate normality of log ratios
> par(mfrow=c(2,1))plot(density(Cy5n-Cy3),col="purple")>>qqnorm(Cy5n-Cy3,col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>
>
![Page 19: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/19.jpg)
A question of significance
> par(mfrow=c(1,1))>plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant",col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>>lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3),col=”yellow")
> >
![Page 20: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/20.jpg)
Identifying a spot on a plot
> par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-4,4),xlim=c(6,12))>text(0.5*(Cy3+Cy5n),Cy5n -Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3), col="yellow")
![Page 21: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/21.jpg)
Saving graphics to a file (postscript)
>postscript(“filename.ps”) par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-0.1,1),xlim=c(10,11))text(0.5*(Cy3+Cy5n),Cy5n-Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)
dev.off()
>
![Page 22: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/22.jpg)
Using R help
> ?plotGeneric X-Y PlottingDescription:Generic function for plotting of R objects. For more details about the graphical parameter arguments, see`par'.Usage: plot(x, ...) plot(x, y, xlim=range(x), ylim=range(y), type="p", main, xlab, ylab, ...) plot(y ~ x, ...)Arguments: x: the coordinates of points in the plot. Alternatively, a single plotting structure or any R object with a `plot’ method can be provided.:
![Page 23: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/23.jpg)
Using R help (2)
> help.start()
![Page 24: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/24.jpg)
R Help (3)
![Page 25: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/25.jpg)
11 22
66
14 15
11
7
16
12
8
443
5
9
13
10
1 2 3 4 ……………….2425 26 27 …………………..48…….…..…........ 1.......….……...……..…………………………….576
577 578 579 …………….10011002 1003 …..…………..1025…….…..…........ 2.......….……...……..…………………………..1152
![Page 26: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/26.jpg)
Level colour plot of background
> bkgmat<-matrix(1:24,nrow=24,ncol=1) for(i in 1:16){ s<-c((((i-1)*576)+1):(i*576)) m<-matrix(CH1B[s],nrow=24,ncol=24,byrow=T) bkgmat<-cbind(bkgmat,m) } bkgmat<-bkgmat[,-1] m1<-bkgmat[,1:96] m2<-bkgmat[,(97:192)] m3<-bkgmat[,(193:(3*96))] m4<-bkgmat[,(((3*96)+1):(4*96))] bkg<-rbind(m1,m2,m3,m4) > + + + >> + + + + > > > > >
> filled.contour(1:96,1:96,bkg,nlevels=100,color.palette=heat.colors)
![Page 27: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/27.jpg)
ConclusionR is flexible and powerful
• Easy to read in data.
• Enables manipulation of data.
• Extensive control of and range of graphics.
• Wide range of statistical functions.
• Add on packages available.
• Can write scripts as a text file to send to collaborators for importing into R. (Use source(“filename”) to import and execute code).
• Can save all the work you do in a session.
![Page 28: Bringing A Statistical Package To The Biologist's Fingertips](https://reader035.vdocuments.us/reader035/viewer/2022081518/5477f9e9b4af9ffc4f8b465b/html5/thumbnails/28.jpg)
Acknowledgements
Terry Speed
Melanie Bahlo
Asa Wirapati
George Rudy
Jean Yee HwaYang
Chuang Fong Kong
Keith Slattery