bringing a statistical package to the biologist’s fingertips with applications to microarray...

Post on 18-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bringing A Statistical Package To The

Biologist’s Fingertips

With Applications to Microarray Analysis

Microarray ExperimentsSome examples of the many types of microarray

experiments currently being considered.• Comparison to normal cells.• Comparison of many cell types using an

appropriate pool of RNA as a reference.• Time series using either time 0 or past time as

a reference• Knockout experiments• Factor experiments

Statistical issues to be addressed.Image analysis.• Spot identification• Background correction

Data analysis• Normalisation• Transformation• Significant genes• Large amounts of data• • ………………….Need a flexible approach.

A tool for analysis : R

R is freeware that is rapidly becoming very widely used.

It can handle the large data files used to analyse microarrays.

Is available for Unix, Linux and Windows.

Has excellent documentation and help available.

Image Analysis and R

In collaboration with the CSIRO (Sydney) , Jean Yee Hwa Yang and Terry Speed have developed a microarray image analysis package that is currently being written for implementation using Z-image and R.

This automated image analysis program overcomes some of the problems and limitations of other commercial packages.

Output will automatically be setup for further analysis in R.

Using R at WEHI

Currently only available on unix02.

Access from a Macintosh is limited to command line window only. The graphics window can only be seen if an X-Windows program is installed on the Mac.

However, if there is a demand for use of R at WEHI then Computer Centre will investigate options to change this situation.

Install R windows on a PC or install R for linux.

Using R at WEHI (2)

NAT>R

R : Copyright 2000, The R Development Core TeamVersion 1.0.0 (February 29, 2000)Type "demo()" for some demos, "help()" for on-line help, or "help.start()" for a HTML browser interface to help.

Type "q()" to quit R.>q()Save workspace image? [y/n/c]: y

NAT>R --vsize=50M --nsize=2000k

How to make a vector

> x<-c(1,3,5,4,7,8)> x[1] 1 3 5 4 7 8

> t(x) [,1] [,2] [,3] [,4] [,5] [,6][1,] 1 3 5 4 7 8

> length(x)[1] 6

> index<-c(2,3,4)> x[index][1] 3 5 4>

How to make a matrix

> xmat<-matrix(x,nrow=2,ncol=3,byrow=T)> xmat [,1] [,2] [,3][1,] 1 3 5[2,] 4 7 8

> xmat[1,2][1] 3> xmat[,3][1] 5 8

> xmat<-matrix(x,nrow=2,ncol=3,byrow=F)> xmat [,1] [,2] [,3][1,] 1 5 7[2,] 3 4 8

Adding and removing a column

> addcol<-c(9,2)>> newxmat<-cbind(xmat,addcol)> newxmat addcol[1,] 1 5 7 9[2,] 3 4 8 2

> oldxmat<-newxmat[,-4]> oldxmat

[1,] 1 5 7[2,] 3 4 8

>

A script to find mean of columns

> for( i in 1:3){+ print(mean(xmat[,i]))+ }> > 2.0 > 4.5 > 7.5 >

m<-0for( i in 1:3){m<-c(m,mean(xmat[,i]))}m<-m[-1]

for( i in 1:3){ print(mean(xmat[,i]))}

> dim(xmat)[1] 2 3> m<-0+ for( i in 1:3){+ m<-c(m,mean(xmat[,i]))+ }+ m<-m[,-1] >+ + + > > > > > > > > m[1] 2.0 4.5 7.5

Reading in Datanum GR GC SR SC NAME X Y CH1ICH1B CH1ISD CH1BSD CH2I CH2B CH2ISDCH2BSD1 1 1 1 1 CL0001 1220.00 890.00 1223.317505 168.473679 435.35226437.599304 1014.603149 139.578949 446.61496021.9375782 1 1 1 2 CL0001 1400.00 890.00 1257.714233 233.368423 337.94632090.568703 975.333313 142.684204 354.19403122.9348183 1 1 1 3 CL0008 1580.00 890.00 333.555542 144.000000 145.99256915.944347 277.730164 126.842102 156.3145299.719757

Reading in data from a text file

>#check that file has same number of arguments >#on each line for all lines>count.fields(file="tp04sk1.txt",sep="\t",skip=0)> . . . . . . . . . . . . . . . . . 16 16 16 16[9145] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9169] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9193] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9217] 16

>tp4sk1<- read.table("tp04sk1.txt", header=T, sep="\t", skip=0, row.names=1)> > >>attach(tp4sk1)> median(CH1I)[1] 375.627

Getting spot info from the dataframe

> cy3 <- CH2I # Greency5 <- CH1I # Red>> cy3bc <- CH2I-CH2B # Background Corrected.cy5bc <- CH1I-CH1B

> # Get duplicates.> d1 <- seq(1,(dim(tp4sk1)[1]-1),2)d2 <- seq(2,(dim(tp4sk1)[1]),2)>> cy3d1 <- cy3bc[d1] cy3d2 <- cy3bc[d2]> cy5d1 <- cy5bc[d1]cy5d2 <- cy5bc[d2]>

Always log the intensities

> > par(mfrow=c(2,3))hist(cy3,col="green")plot(density(cy3),col="green")plot(density(Cy3),col="green") # Use Log base 2 hist(cy5,col="red") plot(density(cy5),col="red")plot(density(Cy5),col="red")>>

Normalisation

>>>>

> par(mfrow=c(2,1))plot(density(Cy3),type="n")lines(density(Cy3),col="green")lines(density(Cy5),col="red")plot(Cy3,Cy5,xlab="Log(cy3) Background Corrected",ylab="Log(cy5) Background Corrected",main="The Need For Normalisation Between Green and Red Intensities")lines(lowess(Cy3,Cy5),col="yellow")

Normalisation (2)

>

>K <- median(

log2(cy3)-log2(cy5) )>>k <- 2**KCy5n <- k*cy5Cy5n <- log2(cy5n)

>

>

Green intensity is a multiple of the red intensity.cy3 <- k*cy5

So when you take logs,log2(cy3) <- K+log2(cy5)

Therefore, estimate K by the median difference of log intensities.

K <- median( Cy3 - Cy5 )k <- 2**(K)cy5n <- k*cy5Cy5n <- log2(cy5n)

Approximate normality of log ratios

> par(mfrow=c(2,1))plot(density(Cy5n-Cy3),col="purple")>>qqnorm(Cy5n-Cy3,col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>

>

A question of significance

> par(mfrow=c(1,1))>plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant",col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>>lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3),col=”yellow")

> >

Identifying a spot on a plot

> par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-4,4),xlim=c(6,12))>text(0.5*(Cy3+Cy5n),Cy5n -Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3), col="yellow")

Saving graphics to a file (postscript)

>postscript(“filename.ps”) par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-0.1,1),xlim=c(10,11))text(0.5*(Cy3+Cy5n),Cy5n-Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)

dev.off()

>

Using R help

> ?plotGeneric X-Y PlottingDescription:Generic function for plotting of R objects. For more details about the graphical parameter arguments, see`par'.Usage: plot(x, ...) plot(x, y, xlim=range(x), ylim=range(y), type="p", main, xlab, ylab, ...) plot(y ~ x, ...)Arguments: x: the coordinates of points in the plot. Alternatively, a single plotting structure or any R object with a `plot’ method can be provided.:

Using R help (2)

> help.start()

R Help (3)

11 22

66

14 15

11

7

16

12

8

443

5

9

13

10

1 2 3 4 ……………….2425 26 27 …………………..48…….…..…........ 1.......….……...……..…………………………….576

577 578 579 …………….10011002 1003 …..…………..1025…….…..…........ 2.......….……...……..…………………………..1152

Level colour plot of background

> bkgmat<-matrix(1:24,nrow=24,ncol=1) for(i in 1:16){ s<-c((((i-1)*576)+1):(i*576)) m<-matrix(CH1B[s],nrow=24,ncol=24,byrow=T) bkgmat<-cbind(bkgmat,m) } bkgmat<-bkgmat[,-1] m1<-bkgmat[,1:96] m2<-bkgmat[,(97:192)] m3<-bkgmat[,(193:(3*96))] m4<-bkgmat[,(((3*96)+1):(4*96))] bkg<-rbind(m1,m2,m3,m4) > + + + >> + + + + > > > > >

> filled.contour(1:96,1:96,bkg,nlevels=100,color.palette=heat.colors)

ConclusionR is flexible and powerful

• Easy to read in data.

• Enables manipulation of data.

• Extensive control of and range of graphics.

• Wide range of statistical functions.

• Add on packages available.

• Can write scripts as a text file to send to collaborators for importing into R. (Use source(“filename”) to import and execute code).

• Can save all the work you do in a session.

Acknowledgements

Terry Speed

Melanie Bahlo

Asa Wirapati

George Rudy

Jean Yee HwaYang

Chuang Fong Kong

Keith Slattery

top related