introduction to r - university of minnesota supercomputing ...introduction to r haoyu yu...
TRANSCRIPT
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
Introduction to R
Haoyu Yu ([email protected], 612-625-1709)
Scientific Consulting Group (SCG) Supercomputing Institute, University of Minnesota help-line: [email protected], 612-626-0802
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Starting R • R library packages • Online resources
• Using R – basic steps and language essentials including
• reading data • data types • control structures • using/writing functions
• basic steps • graphics
Outline
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
What is R? 1. A language and a computing environment for data
analysis. 2. What is the difference between R and S-PLUS?
• S-PLUS: a commercial system from the Insightful Corporation • R: a free system: http://www.r-project.org/
3. There are some differences between the two, but in everyday use they are very similar.
4. R runs on Windows, Mac, and a range of UNIX/Linux operating systems.
5. But how to start?
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
There are many useful links such as:
• R Web site: http://www.r-project.org/
• CRAN on R web site: http://cran.r-project.org/
• R Documentation:
http://cran.r-project.org/doc/manuals/R-intro.pdf
http://cran.r-project.org/doc/manuals/R-lang.pdf
• To start R on MSI machines: take a quick look at the MSI Web site
https://www.msi.umn.edu/sw/r
Useful Links to R Resources
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
" R command line window
Starting R
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Using R Packages
• Installing packages in R: install.packages(“pkg_name”) • Installed packages: library() (or installed.packages() ) • Loading packages: library(“pkg_name”)
R Packages
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
Objects and Functions in R
§ Everything in R is an object: a named storage space • Examples of Objects
- Vectors - Matrices - Arrays - Lists - Data Frames - Factors
§ Functions are a special type of object • Take arguments and carry out some operations
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
> x <- c(1, 2, 3, 4, 5, 6, -2, -3, -4) ; > x [1] 1 2 3 4 5 6 -2 -3 -4 > length(x) [1] 9 > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. -4.000 -2.000 2.000 1.333 4.000 6.000 > dim(x) <- c(3,3) > x [,1] [,2] [,3] [1,] 1 4 -2 [2,] 2 5 -3 [3,] 3 6 -4
> cov(x) [,1] [,2] [,3] [1,] 1 1 -1 [2,] 1 1 -1 [3,] -1 -1 1
Examples of R Objects *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• R can read in data in many different formats: tab-delimited, excel, csv, sas, stata, spss, etc.
some common functions to be used: • readLines • scan • data.frame • read.table • read.cvs • read.delim • read.xls • read.xport (for SAS xport files) • read.dta (read Stata binary file) NOTE: some of these functions belong to add-on R
packages (e.g. the “foreign” package)
Reading Data
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Interaction with databases directly in R
• Definition of R database interface and some links:
http://stat.bell-labs.com/RS-DBI/doc/html/index.html
http://stat.bell-labs.com/RS-DBI/index.html
• RODBC
• An ODBC database interface
• RMySQL
• Database interface and MySQL driver for R
R and Databases
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Vectors • The most basic data object
• Matrices and Arrays • array() stores data in multiple dimensions • matrix() creates a two dimensional array
• Lists • Ordered collection of other objects (same or
different types) • Data frames
• A special class of lists: structure to store tables • Factors
• A data type to handle categorical (nominal) data
Basic Types of Data Objects in R
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Vectorization
• Many functions in R operate on each element of a vector directly to produce a vector of the same length
• Parallel computing in R
• Perform computation in R on multi-core or multi-node of computers through explicit or implicit parallel computing modes
• There are a number of packages that are useful in high-performance computing in R
• http://cran.r-project.org/web/views/HighPerformanceComputing.html
Parallel Computing in R
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• vector: an ordered collection of • Numerical: integer, double, complex • Character • Logical
• R has six basic vector types: integer, double, complex, character, logical, and raw (which is to hold raw bytes)
• Indexing plays a key role • Sub-setting can also be done
• functions to create vectors • c() • seq() • rep()
More Data Details
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• using the matrix function: 2-d arrays > ma <- matrix( 1:15, 3, 5 ) > ma [,1] [,2] [,3] [,4] [,5] [1,] 1 4 7 10 13 [2,] 2 5 8 11 14 [3,] 3 6 9 12 15
• add more rows or more columns to a matrix using rbind() or cbind()
> cbind ( ma, rbind ( A = 1, B =1:4, C = 11:14 ) ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
A 1 4 7 10 13 1 1 1 1 B 2 5 8 11 14 1 2 3 4
C 3 6 9 12 15 11 12 13 14
• array() creates arrays with more dimensions
• array(1:24, dim=c(2,4,3))
High Dimensional Arrays *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• list: combines a collection of objects (that can be of different modes) into one object
Example: Create one structure from one character string with the name “name”, one number with the name “year”, and one numeric vector for scores that has a length 4
> my.student <- list( name = c("Peter"), year = 1, class = c("Math101"), scores = c(80, 96, 88, 91) )
> my.student $name [1] "Peter”
$year [1] 1
$class [1] "Math101"
$scores [1] 80 96 88 91
Lists *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
data frame: a list of variables of the same length with unique row names. (It is like a matrix with columns that can be of different modes. It is displayed in matrix form, rows by columns.)
> my.class <- data.frame( stud.ids = c ("123", "16", "289", "1234", "78", "512"), final = c(90, 85, 99, 83, 92, 79) )
> my.class stud.ids final 1 123 90 2 16 85 3 289 99 4 1234 83 5 78 92 6 512 79
Data Frame *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• a vector of categorical variables that can be ordered or unordered
examples: tumor stage, social class, etc.
• to create:
> gender <- c(1,1,0,1,0,0)
> fgender <- factor ( gender, levels=0:1)
> levels ( fgender) <- c( "male", "female ") > fgender [1] female female male female male male Levels: male female
Factor
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• mode > mode( 2 ) # gives the storage mode [1] "numeric" > typeof( 2 ) # gives the R internal type [1] "double” • length: • names: > x <- list( a = 1, b = "A", c = 2:5 ) > names( x ) [1] "a" "b" "c" > x $a [1] 1 $b [1] "A" $c [1] 2 3 4 5
Data Attributes and Classes *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• comparison operators: • equal: == • not equal: != • greater/less than: > or < • greater/less than or equal: >= or <=
• logical operators: • & (AND): returns TRUE if both comparisons return TRUE
• | (OR): returns TRUE if at least one comparison returns TRUE
• ! (NOT): returns the negation (opposite) of a logical vector
• other operators: • assignment operator: <- or = (a recent addition)
• precedence of operators: ?Syntax
Basic Operators
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• for loops: examples • for ( i in 1:10 ) plot( mydata[ , i] ) • samples <- paste("sample",1:10,sep="")
for( i in samples ) print(i) • while ( condition ) expression • repeat
> y <- 1000 > x <- y/2 > while( abs( x*x-y ) >= 1e-10 ) (x <- (x + y/x)/2) > repeat { + x <- (x + y/x)/2 + if (abs(x*x-y) < 1e-10) break + }
Control Flow Structures
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
> x <- rnorm( 20, mean = 1, sd = 1)
[1] 0.56477127 0.99415526 0.56525882 0.06839325 1.71730871 [6] 0.52559999 -0.22859231 0.60854887 -0.38387933 1.43971644
[11] 1.78149272 2.28914317 1.03011630 0.69887935 0.11625638 [16] 0.82733328 0.36110657 1.53226873 -0.60203855 1.21676773
> ifelse ( x > 1, 1, 0) # “ifelse” is vectorized [1] 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0
> if ( x[1] > 1 ) 1 else 0 [1] 0
> # note: the length of the argument has to be 1 > if ( x > 1 ) 1 else 0 # “if … else” is not vectorized
[1] 0 Warning message:
In if (x > 1) 1 else 0 : the condition has length > 1 and only the first element will be used
If and Else *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• “switch” can evaluate a character string > f <- function( x, type ) {
switch( type , mean = mean ( x ) , range = range ( x ) , accsum = Reduce( "+", x , accumulate = T ) )
}
> f ( x , "mean" ) [1] 0.81785
> f ( x , "range" ) [1] -0.6463109 2.7719993
> f ( x , "accsum" ) [1] 0.2346313 -0.4116796 0.1852275 1.2166363 2.6556395 5.1379385 [7] 4.5755973 6.9113659 6.5363048 8.3764793 8.9208708 10.1792544
[13] 10.2792301 10.7349598 12.8859846 15.6579839 16.4046637 16.4132600 [19] 16.2106088 16.3570006 > x
[1] 0.23463133 -0.64631092 0.59690713 1.03140873 1.43900320 2.48229901 -0.56234115 2.33576855 -0.37506108 1.84017456
[11] 0.54439148 1.25838362 0.09997561 0.45572975 2.15102479] 2.77199931 0.74667984 0.00859621 -0.20265113 0.14639180
Switch
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
> x1 <- c(1,2,3,4) > y1 <- c(1,2,3,4) > x1 * y1 # element wise multiplication [1] 1 4 9 16 > x1 %*% y1 [,1] [1,] 30 > dim (x1) <- c(2,2) > x1 [,1] [,2] [1,] 1 3 [2,] 2 4 > x1 %*% x1 # matrix multiplication (inner product) [,1] [,2] [1,] 7 15 [2,] 10 22 > x1[,2] %*% x1 [,1] [,2] [1,] 11 25 > x1[1,] %*% x1 [,1] [,2] [1,] 7 15
Matrix Computation *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• %o% -- outer product > x1[,1] %o% x1 , , 1 ## x1[,1] %*% t(x1[,1]) [,1] [,2] [1,] 1 2 [2,] 2 4 , , 2 ## x1[,1] %*% t(x1[,2]) [,1] [,2] [1,] 3 4 [2,] 6 8 > dim(x1[,1] %o% x1) [1] 2 2 2 > y1 [1] 1 2 3 4 > y1 %o% y1 ## y1 %*% t(y1) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 2 4 6 8 [3,] 3 6 9 12 [4,] 4 8 12 16
Matrix Computation: More Functions *
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• %x% -- kronecker product of two arrays > y1 %x% y1 ## m-by-n matrix “kronecker” p-by-q matrix gives a mp x nq matrix [1] 1 2 3 4 2 4 6 8 3 6 9 12 4 8 12 16 > kronecker(y1, y1) [1] 1 2 3 4 2 4 6 8 3 6 9 12 4 8 12 16 > kronecker( diag(1, 2), x1 ) [,1] [,2] [,3] [,4] [1,] 1 3 0 0 [2,] 2 4 0 0 [3,] 0 0 1 3 [4,] 0 0 2 4 • crossprod() – cross product (may slightly faster) > crossprod( x1, x1) ## [,1] [,2] [1,] 5 11 [2,] 11 25 > t( x1 ) %*% x1 [,1] [,2] [1,] 5 11 [2,] 11 25
ATA
Matrix Computation: More Functions * … …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• more functions on matrix computation • eigen() – compute eigenvalues and eigenvectors of
numerical matrices. • norm(), rcond(), kappa() – matrix norm and condition
numbers • svd() -- singular-value decomposition of a
rectangular matrix • qr() -- QR decomposition of a matrix • solve() – solves the equations A %*% X = B for x • det() -- calculates the determinant of a matrix • t() -- transpose of a matrix or a data.frame • Conj(t()) -- conjugate transpose of a complex matrix • aperm() – transpose a matrix by permuting its
dimensions
Matrix Computation: More Functions …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• For these data types, all elements have to have the same storage mode: for example: "logical", "integer", "double", "complex”, or "character” • vector • matrix • array • factor
• These data types allow multiple types of elements with different storage modes: • list • data frame
Homogeneous and Heterogeneous Data Types
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
> hist.w.density <- function(x, xlab = deparse(substitute(x)), ...) { h <- hist( x, plot=F ); s <- sd( x ); m <- mean( x ); ylim <- range( 0, h$density ); h1 <- hist(x, freq=F, ylim=ylim, col='lightblue', xlab=xlab, ...); lines( density(x), col='purple', lwd = 2 ); list( mean = m, sd = s ); } > > mydata <- rchisq( 200, 10 ) > hist.w.density( mydata )
Note: deparse(substitute(x)) returns a character string version of the actual argument to the function (in this case, it is the “x”)
R functions: syntax, arguments, etc.
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
R functions: syntax, arguments, etc.
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• The last function uses the default return (the last expression) • a “return” can also be used explicitly: return a value > x1
[,1] [,2]
[1,] 1 6 [2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10 > ( y1 <- seq_len( 5 ) )
[1] 1 2 3 4 5
> xy <- function( x, y = x ) { return(t(x) %*% y) }
> xy ( x = x1 , y = y1 ) # not use the “argument matching” [,1] [1,] 55
[2,] 130
> xy ( x = x1 ) # use the default argument for the function “xy” [,1] [,2]
[1,] 55 130
[2,] 130 330
Default Arguments
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Y(i,j) = a(j) + b(j)*STATUS(i) • the index I and j here can be:
• i goes through experiments (say from 1 to “n”) • j goes through probes (possibly through probesets, say from 1 to “m”)
• in other words for each j, to find a(j) and b(j) that minimize:
• take the derivative w.r.t the coefficients a and b and set them to zero:
!"#$#%"&
'
&=1
*&,, = -, !"#$#%"&
'
&=1
+ /, !("#$#%"&)2'
&=1
!"#,%
&
#=1
= & *% + ,% !-./.0-#
&
#=1
Yi, j ! (aj + bjSTATUSi )( )2i=1
n
"
Least Squares Fitting
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• in a matrix form:
• the (2,2)th element of its inverse matrix is:
n STATUSii=1
n
!
STATUSii=1
n
! (STATUSi )2
i=1
n
!
"
#
$$$$$
%
&
'''''
1 (STATUSi )2 !
STATUSii=1
n
"#
$%
&
'(
n
2
i=1
n
"
#
$
%%%%%
&
'
(((((
Least Squares Fitting …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
lr <- function( y, x ) { nobs <- if (is.null(dim(x))) rep(1,length(x)) else rep(1,nrow(x))
x <- cbind( nobs, x ) solve( crossprod( x, x ) ) %*% t( x ) %*% y
} > cases [1] 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 > dim( Y ) [1] 30 400 > coefficients <- lr( Y, cases ) > dim( coefficients ) [1] 2 400 • crossprod( X, X ) = t( X ) * X • solve( crossprod( X, X ) ) – the inverse of the matrix
• coefficients[2,] -- the “slope” of the linear models
Least Squares Fitting …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• the standard error for the “slope” is
where an estimator for the variance could be: lr <- function(y, x) { nobs <- if (is.null(dim(x))) rep(1,length(x)) else rep(1,nrow(x)) x1 <- cbind(nobs, x)
invA <- solve( crossprod(x1, x1) ) coefficients <- solve(crossprod(x1, x1)) %*% t(x1) %*% y
sde_b <- sqrt(colSums( (y - (x1 %*% coefficients))^2 )/ ( dim(x1)[1]-1) * invA[2,2] )
return ( list( coefficients, sde_b ) ) }
! "#!$%$&!'2 − (∑ !$%$&!-)
2
/01
Yi, j !Yi. j^"
#$
%&'2
/ (n!1)i=1
n
(
Least Squares Fitting …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
> results <- lr ( y = Y, x = cases)
> length( results ) [1] 2
> dim( results[[1]] )
[1] 2 400
> length( results[[2]] )
[1] 400
• results[[1]] – estimates of the linear model coefficients • results[[2]] – standard error of the “slope” coefficients
• There are functions with “empty” argument list
• when the argument of a function is like “…”, it matches all the arguments
• “function” can be passed as arguments of other functions
Least Squares Fitting …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Functions can be used as arguments: > cases <- sapply(
runif( 6 ),
function( r ) { if (r > .5) rep(1, times=4) else rep (0,times=4) } )
> cases
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 0 1 0 0
[2,] 1 1 0 1 0 0
[3,] 1 1 0 1 0 0
[4,] 1 1 0 1 0 0
Functions as Arguments
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Treat functions as objects and pass them as arguments to a list
> means <- list( f1 = function( x ) { mean(x, trim = 0.05) } ,
f2 = function( x ) { mean(x, trim = 0.10) } ,
f3 = function( x ) { mean(x, trim = 0.15) } )
> a <- rnorm( 1000 )
> means[[1]] ( a )
[1] -0.04375495
> means[[2]] ( a )
[1] -0.04692238
> means[[3]] ( a )
[1] -0.04734544
How to Access Functions in a list
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• Functions: apply, lapply, sapply, and vapply: lapply ( data, function )
sapply ( data, function ); vapply ( data, function, tmp_value ) • “data” can be a vector or a list
• these functions return a new list (or maybe simplified list like a vector) by applying the “function” on each of the input components
• vapply is very similar to sapply, but with extra checking on the input components based on “tmp_value”. > x <- list( a = c(1:4), b = c(10:12), c = c(0.1:0.5) ) > x $a [1] 1 2 3 4
$b [1] 10 11 12
$c [1] 0.1
Functions with arguments that are functions
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
functions lapply, sapply, and vapply continued …
> lapply( x, mean ) $a [1] 2.5
$b [1] 11
$c [1] 0.1
> sapply( x, mean )
a b c 2.5 11.0 0.1 what happens if ones uses “vapply”?
Applying Functions
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
function apply: apply( mat, margin, function) • “mat” a matrix or an array
• “margin” the subscript that the “function” is applied
> x <- matrix( rnorm( 100 ), nrow = 20, ncol = 5 ) > x [,1] [,2] [,3] [,4] [,5] [1,] 0.12854081 0.83905886 -2.4188850 -0.40231317 0.45699871 [2,] 0.48828276 0.49086884 -1.6613223 -0.30584325 -0.95571975 [3,] -0.28471730 -0.30888101 -1.3912699 0.29120769 1.42745212 [4,] -1.68301946 0.93181591 1.4815138 0.59722151 0.80390538 [5,] 0.91531660 0.90153731 1.4904788 0.36291027 -0.90925853 ... [17,] -1.71004865 0.14872353 0.9783127 0.59204323 -0.03233043 [18,] -0.35315308 1.38312362 -0.3097932 0.03894128 -0.32933839 [19,] -0.40997310 2.00611129 -1.2574482 -0.17318760 0.92358798 [20,] 0.32112173 -0.29736373 -0.3486134 0.49357963 -0.25097612
> apply( x, 2, var ) [1] 1.6209041 0.9772905 1.5097936 0.4183222 0.8069382
Applying Functions …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
> dim( mat1 ) [1] 100 100
> mode( mat1 ) [1] "numeric” # for each row to find out the percentage of the data that are in the tails # ( 2.5% on each side )
> ind <- apply( mat1, 1, function( row ) {
+ sum(row > qnorm(.975) | row < qnorm(.025)) / length(row) + } ) > length( ind ) [1] 100
> ind[ 1:10 ] [1] 0.02 0.01 0.08 0.06 0.05 0.03 0.07 0.05 0.04 0.04
Applying Functions …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
tapply( x, ind, fun ): apply the “fun” to each cell a ragged array > ( groups <- factor(c(1,2,0,1,0,0,1,0,0,1, 2, 1, 0, 0, 1), levels = 0:2) )
[1] 1 2 0 1 0 0 1 0 0 1 2 1 0 0 1 Levels: 0 1 2
> weight <- c(1.4474550,1.5670582,1.520615,1.5407738,2.042491,1.864607,0.5865089,1.388697, 2.128079,1.6252041, 1.7806790, 2.9572416, 1.8650490, 0.9782662, 0.2721176)
> table( groups )
groups 0 1 2
7 6 2 > tapply( weight, groups, mean )
0 1 2 1.683972 1.404883 1.673869
> tapply( weight, groups, sd ) 0 1 2
0.4088762 0.9414384 0.1510527
Applying a Function to Multiple Elements
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
build a contingency table with tapply( … ): > (test <- data.frame( weight, groups )) weight groups
1 1.4474550 1
2 1.5670582 2 3 1.5206150 0
4 1.5407738 1 5 2.0424910 0
6 1.8646070 0 7 0.5865089 1
8 1.3886970 0 9 2.1280790 0 10 1.6252041 1
11 1.7806790 2 12 2.9572416 1
13 1.8650490 0 14 0.9782662 0
15 0.2721176 1
> tapply( test$weight, test$groups, sum ) 0 1 2
11.787804 8.429301 3.347737
Contingency Table
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• print ( pi, digits=10) [1] 3.141592654
• aa <- seq(0,1,length.out=30) is.na ( aa ) <- aa > 0.75 print ( aa, na.print = ".", digits = 3) [1] 0.0000 0.0345 0.0690 0.1034 0.1379 0.1724 0.2069 0.2414 0.2759 0.3103 0.3448 [12] 0.3793 0.4138 0.4483 0.4828 0.5172 0.5517 0.5862 0.6207 0.6552 0.6897 0.7241 [23] . . . . . . . .
• options ( "digits” = 3 ) cat(aa, fill = 30, labels = paste("[”1:10,"]:",sep=""))
[1]: 0 0.0345 0.069 0.103 [2]: 0.138 0.172 0.207 0.241 [3]: 0.276 0.310 0.345 0.379 [4]: 0.414 0.448 0.483 0.517 [5]: 0.552 0.586 0.621 0.655 [6]: 0.69 0.724 NA NA NA NA [7]: NA NA NA NA
Basic R Print and Concatenate Functions
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• pi # with the default number of digits to print [1] 3.1415927
• cat (letters, fill = 20, labels = paste("[“1:10,"]:",sep="")) [1]: a b c d e f g [2]: h i j k l m n [3]: o p q r s t u [4]: v w x y z
• cat (LETTERS, fill = 20, labels = paste("[“1:10,"]:",sep="")) [1]: A B C D E F G [2]: H I J K L M N [3]: O P Q R S T U [4]: V W X Y Z
• month.name [1] "January" "February" "March" "April" "May" "June" [7] "July" "August" "September" "October" "November" "December”
• month.abb [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Basic R Print and Concatenate Functions …
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• There are many existing packages in R to facilitate creating and developing graphics in R
• ggplot2 • http://had.co.nz/ggplot2/
• lattice • plotrix • rgl (a 3D real-time rendering device driver system for
R): http://rgl.neoscientists.org/gallery.shtml (Will show a d quick demo on using “rgl”.)
• Many others see http://cran.r-project.org/web/views/Graphics.html
R Graphics Packages
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
3D R Graphics from RGL
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
data(volcano) ## from persp in {graphics} z <- 2 * volcano # Exaggerate the relief x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N) y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W) ## Don't draw the grid lines : border = NA par(bg = "white") ## draws perspective plots of a surface over the x–y plane persp(x, y, z, theta = 135, phi = 30, col = "green3", scale = FALSE, ltheta = -120, shade = 0.75, border = NA, box = FALSE) ########################################################################### library(rgl) z <- 2 * volcano # Exaggerate the relief x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N) y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W) zlim <- range(y) zlen <- zlim[2] - zlim[1] + 1 colorlut <- terrain.colors(zlen) # height color lookup table col <- colorlut[ z-zlim[1]+1 ] # assign colors to heights for each point open3d() surface3d(x, y, z, color=col, back="lines") open3d() x <- sort(rnorm(1000)) y <- rnorm(1000) z <- rnorm(1000) + atan2(x,y) plot3d(x, y, z, col=rainbow(1000)) ## to produce rotation of the plot
R Graphics
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
3D R Graphics from RGL
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
3D R Graphics from RGL
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
R Graphics
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
• User-defined variables • Workspace : a region of memory • Useful functions:
> objects() [1] "my.mean" "my.range" "x" "y" > remove(x,y) > objects() [1] "my.mean" "my.range“ > search()
search() function gives a list of objects and attached packages.
• It is recommend to clean up after yourself from time to time if you intend to save the workspace.
Cleaning Up Memory
Supercomputing Institute for Advanced Computational Research
© 2011 Regents of the University of Minnesota. All rights reserved.
traceback(): It shows the sequence of
function calls culminating in the error.
Esc (escape) key in S-PLUS or in R using the
mouse to press the Stop button in the toolbar.
Tips When Things go Wrong